-
Agent-Based Simulations of Online Political Discussions: A Case Study on Elections in Germany
Authors:
Abdul Sittar,
Simon Münker,
Fabio Sartori,
Andreas Reitenbach,
Achim Rettinger,
Michael Mäs,
Alenka Guček,
Marko Grobelnik
Abstract:
User engagement on social media platforms is influenced by historical context, time constraints, and reward-driven interactions. This study presents an agent-based simulation approach that models user interactions, considering past conversation history, motivation, and resource constraints. Utilizing German Twitter data on political discourse, we fine-tune AI models to generate posts and replies,…
▽ More
User engagement on social media platforms is influenced by historical context, time constraints, and reward-driven interactions. This study presents an agent-based simulation approach that models user interactions, considering past conversation history, motivation, and resource constraints. Utilizing German Twitter data on political discourse, we fine-tune AI models to generate posts and replies, incorporating sentiment analysis, irony detection, and offensiveness classification. The simulation employs a myopic best-response model to govern agent behavior, accounting for decision-making based on expected rewards. Our results highlight the impact of historical context on AI-generated responses and demonstrate how engagement evolves under varying constraints.
△ Less
Submitted 11 April, 2025; v1 submitted 31 March, 2025;
originally announced March 2025.
-
Filtered Markovian Projection: Dimensionality Reduction in Filtering for Stochastic Reaction Networks
Authors:
Chiheb Ben Hammouda,
Maksim Chupin,
Sophia Münker,
Raúl Tempone
Abstract:
Stochastic reaction networks (SRNs) model stochastic effects for various applications, including intracellular chemical or biological processes and epidemiology. A typical challenge in practical problems modeled by SRNs is that only a few state variables can be dynamically observed. Given the measurement trajectories, one can estimate the conditional probability distribution of unobserved (hidden)…
▽ More
Stochastic reaction networks (SRNs) model stochastic effects for various applications, including intracellular chemical or biological processes and epidemiology. A typical challenge in practical problems modeled by SRNs is that only a few state variables can be dynamically observed. Given the measurement trajectories, one can estimate the conditional probability distribution of unobserved (hidden) state variables by solving a stochastic filtering problem. In this setting, the conditional distribution evolves over time according to an extensive or potentially infinite-dimensional system of coupled ordinary differential equations with jumps, known as the filtering equation. The current numerical filtering techniques, such as the Filtered Finite State Projection (D'Ambrosio et al., 2022), are hindered by the curse of dimensionality, significantly affecting their computational performance. To address these limitations, we propose to use a dimensionality reduction technique based on the Markovian projection (MP), initially introduced for forward problems (Ben Hammouda et al., 2024). In this work, we explore how to adapt the existing MP approach to the filtering problem and introduce a novel version of the MP, the Filtered MP, that guarantees the consistency of the resulting estimator. The novel method employs a reduced-variance particle filter for estimating the jump intensities of the projected model and solves the filtering equations in a low-dimensional space. The analysis and empirical results highlight the superior computational efficiency of projection methods compared to the existing filtered finite state projection in the large dimensional setting.
△ Less
Submitted 6 March, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
Towards "Differential AI Psychology" and in-context Value-driven Statement Alignment with Moral Foundations Theory
Authors:
Simon Münker
Abstract:
Contemporary research in social sciences is increasingly utilizing state-of-the-art statistical language models to annotate or generate content. While these models perform benchmark-leading on common language tasks and show exemplary task-independent emergent abilities, transferring them to novel out-of-domain tasks is only insufficiently explored. The implications of the statistical black-box app…
▽ More
Contemporary research in social sciences is increasingly utilizing state-of-the-art statistical language models to annotate or generate content. While these models perform benchmark-leading on common language tasks and show exemplary task-independent emergent abilities, transferring them to novel out-of-domain tasks is only insufficiently explored. The implications of the statistical black-box approach - stochastic parrots - are prominently criticized in the language model research community; however, the significance for novel generative tasks is not.
This work investigates the alignment between personalized language models and survey participants on a Moral Foundation Theory questionnaire. We adapt text-to-text models to different political personas and survey the questionnaire repetitively to generate a synthetic population of persona and model combinations. Analyzing the intra-group variance and cross-alignment shows significant differences across models and personas. Our findings indicate that adapted models struggle to represent the survey-captured assessment of political ideologies. Thus, using language models to mimic social interactions requires measurable improvements in in-context optimization or parameter manipulation to align with psychological and sociological stereotypes. Without quantifiable alignment, generating politically nuanced content remains unfeasible. To enhance these representations, we propose a testable framework to generate agents based on moral value statements for future research.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Zero-shot prompt-based classification: topic labeling in times of foundation models in German Tweets
Authors:
Simon Münker,
Kai Kugler,
Achim Rettinger
Abstract:
Filtering and annotating textual data are routine tasks in many areas, like social media or news analytics. Automating these tasks allows to scale the analyses wrt. speed and breadth of content covered and decreases the manual effort required. Due to technical advancements in Natural Language Processing, specifically the success of large foundation models, a new tool for automating such annotation…
▽ More
Filtering and annotating textual data are routine tasks in many areas, like social media or news analytics. Automating these tasks allows to scale the analyses wrt. speed and breadth of content covered and decreases the manual effort required. Due to technical advancements in Natural Language Processing, specifically the success of large foundation models, a new tool for automating such annotation processes by using a text-to-text interface given written guidelines without providing training samples has become available.
In this work, we assess these advancements in-the-wild by empirically testing them in an annotation task on German Twitter data about social and political European crises. We compare the prompt-based results with our human annotation and preceding classification approaches, including Naive Bayes and a BERT-based fine-tuning/domain adaptation pipeline. Our results show that the prompt-based approach - despite being limited by local computation resources during the model selection - is comparable with the fine-tuned BERT but without any annotated training data. Our findings emphasize the ongoing paradigm shift in the NLP landscape, i.e., the unification of downstream tasks and elimination of the need for pre-labeled training data.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
InvBERT: Reconstructing Text from Contextualized Word Embeddings by inverting the BERT pipeline
Authors:
Kai Kugler,
Simon Münker,
Johannes Höhmann,
Achim Rettinger
Abstract:
Digital Humanities and Computational Literary Studies apply text mining methods to investigate literature. Such automated approaches enable quantitative studies on large corpora which would not be feasible by manual inspection alone. However, due to copyright restrictions, the availability of relevant digitized literary works is limited. Derived Text Formats (DTFs) have been proposed as a solution…
▽ More
Digital Humanities and Computational Literary Studies apply text mining methods to investigate literature. Such automated approaches enable quantitative studies on large corpora which would not be feasible by manual inspection alone. However, due to copyright restrictions, the availability of relevant digitized literary works is limited. Derived Text Formats (DTFs) have been proposed as a solution. Here, textual materials are transformed in such a way that copyright-critical features are removed, but that the use of certain analytical methods remains possible. Contextualized word embeddings produced by transformer-encoders (like BERT) are promising candidates for DTFs because they allow for state-of-the-art performance on various analytical tasks and, at first sight, do not disclose the original text. However, in this paper we demonstrate that under certain conditions the reconstruction of the original copyrighted text becomes feasible and its publication in the form of contextualized token representations is not safe. Our attempts to invert BERT suggest, that publishing the encoder as a black box together with the contextualized embeddings is critical, since it allows to generate data to train a decoder with a reconstruction accuracy sufficient to violate copyright laws.
△ Less
Submitted 22 December, 2022; v1 submitted 21 September, 2021;
originally announced September 2021.