Search | arXiv e-print repository

Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic A Case Study on Media Bias

Authors: Timo Spinde, Luyang Lin, Smi Hinterreiter, Isao Echizen

Abstract: This paper introduces TaxoMatic, a framework that leverages large language models to automate definition extraction from academic literature. Focusing on the media bias domain, the framework encompasses data collection, LLM-based relevance classification, and extraction of conceptual definitions. Evaluated on a dataset of 2,398 manually rated articles, the study demonstrates the frameworks effecti… ▽ More This paper introduces TaxoMatic, a framework that leverages large language models to automate definition extraction from academic literature. Focusing on the media bias domain, the framework encompasses data collection, LLM-based relevance classification, and extraction of conceptual definitions. Evaluated on a dataset of 2,398 manually rated articles, the study demonstrates the frameworks effectiveness, with Claude-3-sonnet achieving the best results in both relevance classification and definition extraction. Future directions include expanding datasets and applying TaxoMatic to additional domains. △ Less

Submitted 31 March, 2025; originally announced April 2025.

Journal ref: Proceedings of the International AAAI Conference on Web and Social Media (ICWSM'25) (2025)

arXiv:2503.16674 [pdf, other]

Through the LLM Looking Glass: A Socratic Probing of Donkeys, Elephants, and Markets

Authors: Molly Kennedy, Ayyoob Imani, Timo Spinde, Hinrich Schütze

Abstract: While detecting and avoiding bias in LLM-generated text is becoming increasingly important, media bias often remains subtle and subjective, making it particularly difficult to identify and mitigate. In this study, we assess media bias in LLM-generated content and LLMs' ability to detect subtle ideological bias. We conduct this evaluation using two datasets, PoliGen and EconoLex, covering political… ▽ More While detecting and avoiding bias in LLM-generated text is becoming increasingly important, media bias often remains subtle and subjective, making it particularly difficult to identify and mitigate. In this study, we assess media bias in LLM-generated content and LLMs' ability to detect subtle ideological bias. We conduct this evaluation using two datasets, PoliGen and EconoLex, covering political and economic discourse, respectively. We evaluate seven widely used LLMs by prompting them to generate articles and analyze their ideological preferences via Socratic probing. By using our self-contained Socratic approach, the study aims to directly measure the models' biases rather than relying on external interpretations, thereby minimizing subjective judgments about media bias. Our results reveal a consistent preference of Democratic over Republican positions across all models. Conversely, in economic topics, biases vary among Western LLMs, while those developed in China lean more strongly toward socialism. △ Less

Submitted 22 May, 2025; v1 submitted 20 March, 2025; originally announced March 2025.

arXiv:2412.19545 [pdf]

Enhancing Media Literacy: The Effectiveness of (Human) Annotations and Bias Visualizations on Bias Detection

Authors: Timo Spinde, Fei Wu, Wolfgang Gaissmaier, Gianluca Demartini, Helge Giese

Abstract: Marking biased texts is a practical approach to increase media bias awareness among news consumers. However, little is known about the generalizability of such awareness to new topics or unmarked news articles, and the role of machine-generated bias labels in enhancing awareness remains unclear. This study tests how news consumers may be trained and pre-bunked to detect media bias with bias labels… ▽ More Marking biased texts is a practical approach to increase media bias awareness among news consumers. However, little is known about the generalizability of such awareness to new topics or unmarked news articles, and the role of machine-generated bias labels in enhancing awareness remains unclear. This study tests how news consumers may be trained and pre-bunked to detect media bias with bias labels obtained from different sources (Human or AI) and in various manifestations. We conducted two experiments with 470 and 846 participants, exposing them to various bias-labeling conditions. We subsequently tested how much bias they could identify in unlabeled news materials on new topics. The results show that both Human (t(467) = 4.55, p < .001, d = 0.42) and AI labels (t(467) = 2.49, p = .039, d = 0.23) increased correct detection compared to the control group. Human labels demonstrate larger effect sizes and higher statistical significance. The control group (t(467) = 4.51, p < .001, d = 0.21) also improves performance through mere exposure to study materials. We also find that participants trained with marked biased phrases detected bias most reliably (F(834,1) = 44.00, p < .001, η2part = 0.048). Our experimental framework provides theoretical implications for systematically assessing the generalizability of learning effects in identifying media bias. These findings also provide practical implications for developing news-reading platforms that offer bias indicators and designing media literacy curricula to enhance media bias awareness. △ Less

Submitted 30 December, 2024; v1 submitted 27 December, 2024; originally announced December 2024.

arXiv:2411.11081 [pdf, other]

The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection

Authors: Tomas Horych, Christoph Mandl, Terry Ruas, Andre Greiner-Petter, Bela Gipp, Akiko Aizawa, Timo Spinde

Abstract: High annotation costs from hiring or crowdsourcing complicate the creation of large, high-quality datasets needed for training reliable text classifiers. Recent research suggests using Large Language Models (LLMs) to automate the annotation process, reducing these costs while maintaining data quality. LLMs have shown promising results in annotating downstream tasks like hate speech detection and p… ▽ More High annotation costs from hiring or crowdsourcing complicate the creation of large, high-quality datasets needed for training reliable text classifiers. Recent research suggests using Large Language Models (LLMs) to automate the annotation process, reducing these costs while maintaining data quality. LLMs have shown promising results in annotating downstream tasks like hate speech detection and political framing. Building on the success in these areas, this study investigates whether LLMs are viable for annotating the complex task of media bias detection and whether a downstream media bias classifier can be trained on such data. We create annolexical, the first large-scale dataset for media bias classification with over 48000 synthetically annotated examples. Our classifier, fine-tuned on this dataset, surpasses all of the annotator LLMs by 5-9 percent in Matthews Correlation Coefficient (MCC) and performs close to or outperforms the model trained on human-labeled data when evaluated on two media bias benchmark datasets (BABE and BASIL). This study demonstrates how our approach significantly reduces the cost of dataset creation in the media bias domain and, by extension, the development of classifiers, while our subsequent behavioral stress-testing reveals some of its current limitations and trade-offs. △ Less

Submitted 24 January, 2025; v1 submitted 17 November, 2024; originally announced November 2024.

arXiv:2407.17111 [pdf, other]

News Ninja: Gamified Annotation of Linguistic Bias in Online News

Authors: Smi Hinterreiter, Timo Spinde, Sebastian Oberdörfer, Isao Echizen, Marc Erich Latoschik

Abstract: Recent research shows that visualizing linguistic bias mitigates its negative effects. However, reliable automatic detection methods to generate such visualizations require costly, knowledge-intensive training data. To facilitate data collection for media bias datasets, we present News Ninja, a game employing data-collecting game mechanics to generate a crowdsourced dataset. Before annotating sent… ▽ More Recent research shows that visualizing linguistic bias mitigates its negative effects. However, reliable automatic detection methods to generate such visualizations require costly, knowledge-intensive training data. To facilitate data collection for media bias datasets, we present News Ninja, a game employing data-collecting game mechanics to generate a crowdsourced dataset. Before annotating sentences, players are educated on media bias via a tutorial. Our findings show that datasets gathered with crowdsourced workers trained on News Ninja can reach significantly higher inter-annotator agreements than expert and crowdsourced datasets with similar data quality. As News Ninja encourages continuous play, it allows datasets to adapt to the reception and contextualization of news over time, presenting a promising strategy to reduce data collection expenses, educate players, and promote long-term bias mitigation. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.17045 [pdf, other]

NewsUnfold: Creating a News-Reading Application That Indicates Linguistic Media Bias and Collects Feedback

Authors: Smi Hinterreiter, Martin Wessel, Fabian Schliski, Isao Echizen, Marc Erich Latoschik, Timo Spinde

Abstract: Media bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address digital media bias is to detect and indicate it automatically through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. Human-in-the-loop-based feedback mechanisms have proven an effective way to facilitate the data-g… ▽ More Media bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address digital media bias is to detect and indicate it automatically through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. Human-in-the-loop-based feedback mechanisms have proven an effective way to facilitate the data-gathering process. Therefore, we introduce and test feedback mechanisms for the media bias domain, which we then implement on NewsUnfold, a news-reading web application to collect reader feedback on machine-generated bias highlights within online news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, the feedback mechanism shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnfold demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses and continuously update datasets to changes in context. △ Less

Submitted 29 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

arXiv:2403.07910 [pdf, ps, other]

MAGPIE: Multi-Task Media-Bias Analysis Generalization for Pre-Trained Identification of Expressions

Authors: Tomáš Horych, Martin Wessel, Jan Philip Wahle, Terry Ruas, Jerome Waßmuth, André Greiner-Petter, Akiko Aizawa, Bela Gipp, Timo Spinde

Abstract: Media bias detection poses a complex, multifaceted problem traditionally tackled using single-task models and small in-domain datasets, consequently lacking generalizability. To address this, we introduce MAGPIE, the first large-scale multi-task pre-training approach explicitly tailored for media bias detection. To enable pre-training at scale, we present Large Bias Mixture (LBM), a compilation of… ▽ More Media bias detection poses a complex, multifaceted problem traditionally tackled using single-task models and small in-domain datasets, consequently lacking generalizability. To address this, we introduce MAGPIE, the first large-scale multi-task pre-training approach explicitly tailored for media bias detection. To enable pre-training at scale, we present Large Bias Mixture (LBM), a compilation of 59 bias-related tasks. MAGPIE outperforms previous approaches in media bias detection on the Bias Annotation By Experts (BABE) dataset, with a relative improvement of 3.3% F1-score. MAGPIE also performs better than previous models on 5 out of 8 tasks in the Media Bias Identification Benchmark (MBIB). Using a RoBERTa encoder, MAGPIE needs only 15% of finetuning steps compared to single-task approaches. Our evaluation shows, for instance, that tasks like sentiment and emotionality boost all learning, all tasks enhance fake news detection, and scaling tasks leads to the best results. MAGPIE confirms that MTL is a promising approach for addressing media bias detection, enhancing the accuracy and efficiency of existing models. Furthermore, LBM is the first available resource collection focused on media bias MTL. △ Less

Submitted 12 June, 2025; v1 submitted 26 February, 2024; originally announced March 2024.

Journal ref: LREC-COLING 2024

arXiv:2312.16148 [pdf, other]

The Media Bias Taxonomy: A Systematic Literature Review on the Forms and Automated Detection of Media Bias

Authors: Timo Spinde, Smi Hinterreiter, Fabian Haak, Terry Ruas, Helge Giese, Norman Meuschke, Bela Gipp

Abstract: The way the media presents events can significantly affect public perception, which in turn can alter people's beliefs and views. Media bias describes a one-sided or polarizing perspective on a topic. This article summarizes the research on computational methods to detect media bias by systematically reviewing 3140 research papers published between 2019 and 2022. To structure our review and suppor… ▽ More The way the media presents events can significantly affect public perception, which in turn can alter people's beliefs and views. Media bias describes a one-sided or polarizing perspective on a topic. This article summarizes the research on computational methods to detect media bias by systematically reviewing 3140 research papers published between 2019 and 2022. To structure our review and support a mutual understanding of bias across research domains, we introduce the Media Bias Taxonomy, which provides a coherent overview of the current state of research on media bias from different perspectives. We show that media bias detection is a highly active research field, in which transformer-based classification approaches have led to significant improvements in recent years. These improvements include higher classification accuracy and the ability to detect more fine-granular types of bias. However, we have identified a lack of interdisciplinarity in existing projects, and a need for more awareness of the various types of media bias to support methodologically thorough performance evaluations of media bias detection systems. Concluding from our analysis, we see the integration of recent machine learning advancements with reliable and diverse bias assessment strategies from other research areas as the most promising area for future research contributions in the field. △ Less

Submitted 10 January, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

arXiv:2304.13148 [pdf, other]

doi 10.1145/3539618.3591882

Introducing MBIB -- the first Media Bias Identification Benchmark Task and Dataset Collection

Authors: Martin Wessel, Tomáš Horych, Terry Ruas, Akiko Aizawa, Bela Gipp, Timo Spinde

Abstract: Although media bias detection is a complex multi-task problem, there is, to date, no unified benchmark grouping these evaluation tasks. We introduce the Media Bias Identification Benchmark (MBIB), a comprehensive benchmark that groups different types of media bias (e.g., linguistic, cognitive, political) under a common framework to test how prospective detection techniques generalize. After review… ▽ More Although media bias detection is a complex multi-task problem, there is, to date, no unified benchmark grouping these evaluation tasks. We introduce the Media Bias Identification Benchmark (MBIB), a comprehensive benchmark that groups different types of media bias (e.g., linguistic, cognitive, political) under a common framework to test how prospective detection techniques generalize. After reviewing 115 datasets, we select nine tasks and carefully propose 22 associated datasets for evaluating media bias detection techniques. We evaluate MBIB using state-of-the-art Transformer techniques (e.g., T5, BART). Our results suggest that while hate speech, racial bias, and gender bias are easier to detect, models struggle to handle certain bias types, e.g., cognitive and political bias. However, our results show that no single technique can outperform all the others significantly. We also find an uneven distribution of research interest and resource allocation to the individual tasks in media bias. A unified benchmark encourages the development of more robust systems and shifts the current paradigm in media bias detection evaluation towards solutions that tackle not one but multiple media bias types simultaneously. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: To be published in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23)

arXiv:2303.09957 [pdf, other]

doi 10.1007/978-3-031-28032-0_31

A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documents

Authors: Norman Meuschke, Apurva Jagdale, Timo Spinde, Jelena Mitrović, Bela Gipp

Abstract: Extracting information from academic PDF documents is crucial for numerous indexing, retrieval, and analysis use cases. Choosing the best tool to extract specific content elements is difficult because many, technically diverse tools are available, but recent performance benchmarks are rare. Moreover, such benchmarks typically cover only a few content elements like header metadata or bibliographic… ▽ More Extracting information from academic PDF documents is crucial for numerous indexing, retrieval, and analysis use cases. Choosing the best tool to extract specific content elements is difficult because many, technically diverse tools are available, but recent performance benchmarks are rare. Moreover, such benchmarks typically cover only a few content elements like header metadata or bibliographic references and use smaller datasets from specific academic disciplines. We provide a large and diverse evaluation framework that supports more extraction tasks than most related datasets. Our framework builds upon DocBank, a multi-domain dataset of 1.5M annotated content elements extracted from 500K pages of research papers on arXiv. Using the new framework, we benchmark ten freely available tools in extracting document metadata, bibliographic references, tables, and other content elements from academic PDF documents. GROBID achieves the best metadata and reference extraction results, followed by CERMINE and Science Parse. For table extraction, Adobe Extract outperforms other tools, even though the performance is much lower than for other content elements. All tools struggle to extract lists, footers, and equations. We conclude that more research on improving and combining tools is necessary to achieve satisfactory extraction quality for most content elements. Evaluation datasets and frameworks like the one we present support this line of research. We make our data and code publicly available to contribute toward this goal. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: iConference 2023

arXiv:2211.03491 [pdf, other]

doi 10.1007/978-3-030-96957-8_20

Exploiting Transformer-based Multitask Learning for the Detection of Media Bias in News Articles

Authors: Timo Spinde, Jan-David Krieger, Terry Ruas, Jelena Mitrović, Franz Götz-Hahn, Akiko Aizawa, Bela Gipp

Abstract: Media has a substantial impact on the public perception of events. A one-sided or polarizing perspective on any topic is usually described as media bias. One of the ways how bias in news articles can be introduced is by altering word choice. Biased word choices are not always obvious, nor do they exhibit high context-dependency. Hence, detecting bias is often difficult. We propose a Transformer-ba… ▽ More Media has a substantial impact on the public perception of events. A one-sided or polarizing perspective on any topic is usually described as media bias. One of the ways how bias in news articles can be introduced is by altering word choice. Biased word choices are not always obvious, nor do they exhibit high context-dependency. Hence, detecting bias is often difficult. We propose a Transformer-based deep learning architecture trained via Multi-Task Learning using six bias-related data sets to tackle the media bias detection problem. Our best-performing implementation achieves a macro $F_{1}$ of 0.776, a performance boost of 3\% compared to our baseline, outperforming existing methods. Our results indicate Multi-Task Learning as a promising alternative to improve existing baseline models in identifying slanted reporting. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Journal ref: Proceedings of the iConference 2022

arXiv:2209.14557 [pdf, other]

doi 10.18653/v1/2021.findings-emnlp.101

Neural Media Bias Detection Using Distant Supervision With BABE -- Bias Annotations By Experts

Authors: Timo Spinde, Manuel Plank, Jan-David Krieger, Terry Ruas, Bela Gipp, Akiko Aizawa

Abstract: Media coverage has a substantial effect on the public perception of events. Nevertheless, media outlets are often biased. One way to bias news articles is by altering the word choice. The automatic identification of bias by word choice is challenging, primarily due to the lack of a gold standard data set and high context dependencies. This paper presents BABE, a robust and diverse data set created… ▽ More Media coverage has a substantial effect on the public perception of events. Nevertheless, media outlets are often biased. One way to bias news articles is by altering the word choice. The automatic identification of bias by word choice is challenging, primarily due to the lack of a gold standard data set and high context dependencies. This paper presents BABE, a robust and diverse data set created by trained experts, for media bias research. We also analyze why expert labeling is essential within this domain. Our data set offers better annotation quality and higher inter-annotator agreement than existing work. It consists of 3,700 sentences balanced among topics and outlets, containing media bias labels on the word and sentence level. Based on our data, we also introduce a way to detect bias-inducing sentences in news articles automatically. Our best performing BERT-based model is pre-trained on a larger corpus consisting of distant labels. Fine-tuning and evaluating the model on our proposed supervised data set, we achieve a macro F1-score of 0.804, outperforming existing methods. △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: substantial text overlap with Ph.D. proposal by same author, part of dissertation arXiv:2112.13352

Journal ref: Findings of the Association for Computational Linguistics: EMNLP 2021

arXiv:2205.10773 [pdf, other]

doi 10.1145/3529372.3530932

A Domain-adaptive Pre-training Approach for Language Bias Detection in News

Authors: Jan-David Krieger, Timo Spinde, Terry Ruas, Juhi Kulshrestha, Bela Gipp

Abstract: Media bias is a multi-faceted construct influencing individual behavior and collective decision-making. Slanted news reporting is the result of one-sided and polarized writing which can occur in various forms. In this work, we focus on an important form of media bias, i.e. bias by word choice. Detecting biased word choices is a challenging task due to its linguistic complexity and the lack of repr… ▽ More Media bias is a multi-faceted construct influencing individual behavior and collective decision-making. Slanted news reporting is the result of one-sided and polarized writing which can occur in various forms. In this work, we focus on an important form of media bias, i.e. bias by word choice. Detecting biased word choices is a challenging task due to its linguistic complexity and the lack of representative gold-standard corpora. We present DA-RoBERTa, a new state-of-the-art transformer-based model adapted to the media bias domain which identifies sentence-level bias with an F1 score of 0.814. In addition, we also train, DA-BERT and DA-BART, two more transformer models adapted to the bias domain. Our proposed domain-adapted models outperform prior bias detection approaches on the same data. △ Less

Submitted 22 May, 2022; originally announced May 2022.

Journal ref: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries 2022 (JCDL)

arXiv:2112.13352 [pdf, other]

An Interdisciplinary Approach for the Automated Detection and Visualization of Media Bias in News Articles

Authors: Timo Spinde

Abstract: Media coverage has a substantial effect on the public perception of events. Nevertheless, media outlets are often biased. One way to bias news articles is by altering the word choice. The automatic identification of bias by word choice is challenging, primarily due to the lack of gold-standard data sets and high context dependencies. In this research project, I aim to devise data sets and methods… ▽ More Media coverage has a substantial effect on the public perception of events. Nevertheless, media outlets are often biased. One way to bias news articles is by altering the word choice. The automatic identification of bias by word choice is challenging, primarily due to the lack of gold-standard data sets and high context dependencies. In this research project, I aim to devise data sets and methods to identify media bias. To achieve this, I plan to research methods using natural language processing and deep learning while employing models and using analysis concepts from psychology and linguistics. The first results indicate the effectiveness of an interdisciplinary research approach. My vision is to devise a system that helps news readers become aware of media coverage differences caused by bias. So far, my best performing BERT-based model is pre-trained on a larger corpus consisting of distant labels, indicating that distant supervision has the potential to become a solution for the difficult task of bias detection. △ Less

Submitted 26 December, 2021; originally announced December 2021.

Journal ref: 2021 IEEE International Conference on Data Mining Workshops (ICDMW)

arXiv:2112.07421 [pdf, other]

Towards A Reliable Ground-Truth For Biased Language Detection

Authors: Timo Spinde, David Krieger, Manuel Plank, Bela Gipp

Abstract: Reference texts such as encyclopedias and news articles can manifest biased language when objective reporting is substituted by subjective writing. Existing methods to detect bias mostly rely on annotated data to train machine learning models. However, low annotator agreement and comparability is a substantial drawback in available media bias corpora. To evaluate data collection options, we collec… ▽ More Reference texts such as encyclopedias and news articles can manifest biased language when objective reporting is substituted by subjective writing. Existing methods to detect bias mostly rely on annotated data to train machine learning models. However, low annotator agreement and comparability is a substantial drawback in available media bias corpora. To evaluate data collection options, we collect and compare labels obtained from two popular crowdsourcing platforms. Our results demonstrate the existing crowdsourcing approaches' lack of data quality, underlining the need for a trained expert framework to gather a more reliable dataset. By creating such a framework and gathering a first dataset, we are able to improve Krippendorff's $α$ = 0.144 (crowdsourcing labels) to $α$ = 0.419 (expert labels). We conclude that detailed annotator training increases data quality, improving the performance of existing bias detection systems. We will continue to extend our dataset in the future. △ Less

Submitted 17 December, 2021; v1 submitted 14 December, 2021; originally announced December 2021.

arXiv:2112.07392 [pdf]

Do You Think It's Biased? How To Ask For The Perception Of Media Bias

Authors: Timo Spinde, Christina Kreuter, Wolfgang Gaissmaier, Felix Hamborg, Bela Gipp, Helge Giese

Abstract: Media coverage possesses a substantial effect on the public perception of events. The way media frames events can significantly alter the beliefs and perceptions of our society. Nevertheless, nearly all media outlets are known to report news in a biased way. While such bias can be introduced by altering the word choice or omitting information, the perception of bias also varies largely depending o… ▽ More Media coverage possesses a substantial effect on the public perception of events. The way media frames events can significantly alter the beliefs and perceptions of our society. Nevertheless, nearly all media outlets are known to report news in a biased way. While such bias can be introduced by altering the word choice or omitting information, the perception of bias also varies largely depending on a reader's personal background. Therefore, media bias is a very complex construct to identify and analyze. Even though media bias has been the subject of many studies, previous assessment strategies are oversimplified, lack overlap and empirical evaluation. Thus, this study aims to develop a scale that can be used as a reliable standard to evaluate article bias. To name an example: Intending to measure bias in a news article, should we ask, "How biased is the article?" or should we instead ask, "How did the article treat the American president?". We conducted a literature search to find 824 relevant questions about text perception in previous research on the topic. In a multi-iterative process, we summarized and condensed these questions semantically to conclude a complete and representative set of possible question types about bias. The final set consisted of 25 questions with varying answering formats, 17 questions using semantic differentials, and six ratings of feelings. We tested each of the questions on 190 articles with overall 663 participants to identify how well the questions measure an article's perceived bias. Our results show that 21 final items are suitable and reliable for measuring the perception of media bias. We publish the final set of questions on http://bias-question-tree.gipplab.org/. △ Less

Submitted 16 December, 2021; v1 submitted 14 December, 2021; originally announced December 2021.

arXiv:2112.07391 [pdf]

TASSY -- A Text Annotation Survey System

Authors: Timo Spinde, Kanishka Sinha, Norman Meuschke, Bela Gipp

Abstract: We present a free and open-source tool for creating web-based surveys that include text annotation tasks. Existing tools offer either text annotation or survey functionality but not both. Combining the two input types is particularly relevant for investigating a reader's perception of a text which also depends on the reader's background, such as age, gender, and education. Our tool caters primaril… ▽ More We present a free and open-source tool for creating web-based surveys that include text annotation tasks. Existing tools offer either text annotation or survey functionality but not both. Combining the two input types is particularly relevant for investigating a reader's perception of a text which also depends on the reader's background, such as age, gender, and education. Our tool caters primarily to the needs of researchers in the Library and Information Sciences, the Social Sciences, and the Humanities who apply Content Analysis to investigate, e.g., media bias, political communication, or fake news. △ Less

Submitted 16 December, 2021; v1 submitted 14 December, 2021; originally announced December 2021.

arXiv:2112.07384 [pdf]

doi 10.1007/978-3-030-71305-8_17

Identification of Biased Terms in News Articles by Comparison of Outlet-specific Word Embeddings

Authors: Timo Spinde, Lada Rudnitckaia, Felix Hamborg, Bela Gipp

Abstract: Slanted news coverage, also called media bias, can heavily influence how news consumers interpret and react to the news. To automatically identify biased language, we present an exploratory approach that compares the context of related words. We train two word embedding models, one on texts of left-wing, the other on right-wing news outlets. Our hypothesis is that a word's representations in both… ▽ More Slanted news coverage, also called media bias, can heavily influence how news consumers interpret and react to the news. To automatically identify biased language, we present an exploratory approach that compares the context of related words. We train two word embedding models, one on texts of left-wing, the other on right-wing news outlets. Our hypothesis is that a word's representations in both word embedding spaces are more similar for non-biased words than biased words. The underlying idea is that the context of biased words in different news outlets varies more strongly than the one of non-biased words, since the perception of a word as being biased differs depending on its context. While we do not find statistical significance to accept the hypothesis, the results show the effectiveness of the approach. For example, after a linear mapping of both word embeddings spaces, 31% of the words with the largest distances potentially induce bias. To improve the results, we find that the dataset needs to be significantly larger, and we derive further methodology as future research direction. To our knowledge, this paper presents the first in-depth look at the context of bias words measured by word embeddings. △ Less

Submitted 14 December, 2021; originally announced December 2021.

arXiv:2110.09151 [pdf, other]

How to Effectively Identify and Communicate Person-Targeting Media Bias in Daily News Consumption?

Authors: Felix Hamborg, Timo Spinde, Kim Heinser, Karsten Donnay, Bela Gipp

Abstract: Slanted news coverage strongly affects public opinion. This is especially true for coverage on politics and related issues, where studies have shown that bias in the news may influence elections and other collective decisions. Due to its viable importance, news coverage has long been studied in the social sciences, resulting in comprehensive models to describe it and effective yet costly methods t… ▽ More Slanted news coverage strongly affects public opinion. This is especially true for coverage on politics and related issues, where studies have shown that bias in the news may influence elections and other collective decisions. Due to its viable importance, news coverage has long been studied in the social sciences, resulting in comprehensive models to describe it and effective yet costly methods to analyze it, such as content analysis. We present an in-progress system for news recommendation that is the first to automate the manual procedure of content analysis to reveal person-targeting biases in news articles reporting on policy issues. In a large-scale user study, we find very promising results regarding this interdisciplinary research direction. Our recommender detects and reveals substantial frames that are actually present in individual news articles. In contrast, prior work rather only facilitates the visibility of biases, e.g., by distinguishing left- and right-wing outlets. Further, our study shows that recommending news articles that differently frame an event significantly improves respondents' awareness of bias. △ Less

Submitted 18 October, 2021; originally announced October 2021.

arXiv:2105.11910 [pdf]

MBIC -- A Media Bias Annotation Dataset Including Annotator Characteristics

Authors: T. Spinde, L. Rudnitckaia, K. Sinha, F. Hamborg, B. Gipp, K. Donnay

Abstract: Many people consider news articles to be a reliable source of information on current events. However, due to the range of factors influencing news agencies, such coverage may not always be impartial. Media bias, or slanted news coverage, can have a substantial impact on public perception of events, and, accordingly, can potentially alter the beliefs and views of the public. The main data gap in cu… ▽ More Many people consider news articles to be a reliable source of information on current events. However, due to the range of factors influencing news agencies, such coverage may not always be impartial. Media bias, or slanted news coverage, can have a substantial impact on public perception of events, and, accordingly, can potentially alter the beliefs and views of the public. The main data gap in current research on media bias detection is a robust, representative, and diverse dataset containing annotations of biased words and sentences. In particular, existing datasets do not control for the individual background of annotators, which may affect their assessment and, thus, represents critical information for contextualizing their annotations. In this poster, we present a matrix-based methodology to crowdsource such data using a self-developed annotation platform. We also present MBIC (Media Bias Including Characteristics) - the first sample of 1,700 statements representing various media bias instances. The statements were reviewed by ten annotators each and contain labels for media bias identification both on the word and sentence level. MBIC is the first available dataset about media bias reporting detailed information on annotator characteristics and their individual background. The current dataset already significantly extends existing data in this domain providing unique and more reliable insights into the perception of bias. In future, we will further extend it both with respect to the number of articles and annotators per article. △ Less

Submitted 20 May, 2021; originally announced May 2021.

arXiv:2105.09640 [pdf]

doi 10.1145/3383583.3398619

Enabling News Consumers to View and Understand Biased News Coverage: A Study on the Perception and Visualization of Media Bias

Authors: Timo Spinde, Felix Hamborg, Karsten Donnay, Angelica Becerra, Bela Gipp

Abstract: Traditional media outlets are known to report political news in a biased way, potentially affecting the political beliefs of the audience and even altering their voting behaviors. Many researchers focus on automatically detecting and identifying media bias in the news, but only very few studies exist that systematically analyze how theses biases can be best visualized and communicated. We create t… ▽ More Traditional media outlets are known to report political news in a biased way, potentially affecting the political beliefs of the audience and even altering their voting behaviors. Many researchers focus on automatically detecting and identifying media bias in the news, but only very few studies exist that systematically analyze how theses biases can be best visualized and communicated. We create three manually annotated datasets and test varying visualization strategies. The results show no strong effects of becoming aware of the bias of the treatment groups compared to the control group, although a visualization of hand-annotated bias communicated bias instances more effectively than a framing visualization. Showing participants an overview page, which opposes different viewpoints on the same topic, does not yield differences in respondents' bias perception. Using a multilevel model, we find that perceived journalist bias is significantly related to perceived political extremeness and impartiality of the article. △ Less

Submitted 20 May, 2021; originally announced May 2021.

Showing 1–21 of 21 results for author: Spinde, T