-
United in Diversity? Contextual Biases in LLM-Based Predictions of the 2024 European Parliament Elections
Authors:
Leah von der Heyde,
Anna-Carolina Haensch,
Alexander Wenz,
Bolei Ma
Abstract:
"Synthetic samples" based on large language models (LLMs) have been argued to serve as efficient alternatives to surveys of humans, assuming that their training data includes information on human attitudes and behavior. However, LLM-synthetic samples might exhibit bias, for example due to training data and fine-tuning processes being unrepresentative of diverse contexts. Such biases risk reinforci…
▽ More
"Synthetic samples" based on large language models (LLMs) have been argued to serve as efficient alternatives to surveys of humans, assuming that their training data includes information on human attitudes and behavior. However, LLM-synthetic samples might exhibit bias, for example due to training data and fine-tuning processes being unrepresentative of diverse contexts. Such biases risk reinforcing existing biases in research, policymaking, and society. Therefore, researchers need to investigate if and under which conditions LLM-generated synthetic samples can be used for public opinion prediction. In this study, we examine to what extent LLM-based predictions of individual public opinion exhibit context-dependent biases by predicting the results of the 2024 European Parliament elections. Prompting three LLMs with individual-level background information of 26,000 eligible European voters, we ask the LLMs to predict each person's voting behavior. By comparing them to the actual results, we show that LLM-based predictions of future voting behavior largely fail, their accuracy is unequally distributed across national and linguistic contexts, and they require detailed attitudinal information in the prompt. The findings emphasize the limited applicability of LLM-synthetic samples to public opinion prediction. In investigating their contextual biases, this study contributes to the understanding and mitigation of inequalities in the development of LLMs and their applications in computational social science.
△ Less
Submitted 17 April, 2025; v1 submitted 29 August, 2024;
originally announced September 2024.
-
Vox Populi, Vox AI? Using Language Models to Estimate German Public Opinion
Authors:
Leah von der Heyde,
Anna-Carolina Haensch,
Alexander Wenz
Abstract:
The recent development of large language models (LLMs) has spurred discussions about whether LLM-generated "synthetic samples" could complement or replace traditional surveys, considering their training data potentially reflects attitudes and behaviors prevalent in the population. A number of mostly US-based studies have prompted LLMs to mimic survey respondents, with some of them finding that the…
▽ More
The recent development of large language models (LLMs) has spurred discussions about whether LLM-generated "synthetic samples" could complement or replace traditional surveys, considering their training data potentially reflects attitudes and behaviors prevalent in the population. A number of mostly US-based studies have prompted LLMs to mimic survey respondents, with some of them finding that the responses closely match the survey data. However, several contextual factors related to the relationship between the respective target population and LLM training data might affect the generalizability of such findings. In this study, we investigate the extent to which LLMs can estimate public opinion in Germany, using the example of vote choice. We generate a synthetic sample of personas matching the individual characteristics of the 2017 German Longitudinal Election Study respondents. We ask the LLM GPT-3.5 to predict each respondent's vote choice and compare these predictions to the survey-based estimates on the aggregate and subgroup levels. We find that GPT-3.5 does not predict citizens' vote choice accurately, exhibiting a bias towards the Green and Left parties. While the LLM captures the tendencies of "typical" voter subgroups, such as partisans, it misses the multifaceted factors swaying individual voter choices. By examining the LLM-based prediction of voting behavior in a new context, our study contributes to the growing body of research about the conditions under which LLMs can be leveraged for studying public opinion. The findings point to disparities in opinion representation in LLMs and underscore the limitations in applying them for public opinion estimation.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
The Second-Level Smartphone Divide: A Typology of Smartphone Usage Based on Frequency of Use, Skills, and Types of Activities
Authors:
Alexander Wenz,
Florian Keusch
Abstract:
This paper examines inequalities in the usage of smartphone technology based on five samples of smartphone owners collected in Germany and Austria between 2016 and 2020. We identify six distinct types of smartphone users by conducting latent class analyses that classify individuals based on their frequency of smartphone use, self-rated smartphone skills, and activities carried out on their smartph…
▽ More
This paper examines inequalities in the usage of smartphone technology based on five samples of smartphone owners collected in Germany and Austria between 2016 and 2020. We identify six distinct types of smartphone users by conducting latent class analyses that classify individuals based on their frequency of smartphone use, self-rated smartphone skills, and activities carried out on their smartphone. The results show that the smartphone usage types differ significantly by sociodemographic and smartphone-related characteristics: The types reflecting more frequent and diverse smartphone use are younger, have higher levels of educational attainment, and are more likely to use an iPhone. Overall, the composition of the latent classes and their characteristics are robust across samples and time.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
Social Media Monitoring of the Campaigns for the 2013 German Bundestag Elections on Facebook and Twitter
Authors:
Lars Kaczmirek,
Philipp Mayr,
Ravi Vatrapu,
Arnim Bleier,
Manuela Blumenberg,
Tobias Gummer,
Abid Hussain,
Katharina Kinder-Kurlanda,
Kaveh Manshaei,
Mark Thamm,
Katrin Weller,
Alexander Wenz,
Christof Wolf
Abstract:
As more and more people use social media to communicate their view and perception of elections, researchers have increasingly been collecting and analyzing data from social media platforms. Our research focuses on social media communication related to the 2013 election of the German parlia-ment [translation: Bundestagswahl 2013]. We constructed several social media datasets using data from Faceboo…
▽ More
As more and more people use social media to communicate their view and perception of elections, researchers have increasingly been collecting and analyzing data from social media platforms. Our research focuses on social media communication related to the 2013 election of the German parlia-ment [translation: Bundestagswahl 2013]. We constructed several social media datasets using data from Facebook and Twitter. First, we identified the most relevant candidates (n=2,346) and checked whether they maintained social media accounts. The Facebook data was collected in November 2013 for the period of January 2009 to October 2013. On Facebook we identified 1,408 Facebook walls containing approximately 469,000 posts. Twitter data was collected between June and December 2013 finishing with the constitution of the government. On Twitter we identified 1,009 candidates and 76 other agents, for example, journalists. We estimated the number of relevant tweets to exceed eight million for the period from July 27 to September 27 alone. In this document we summarize past research in the literature, discuss possibilities for research with our data set, explain the data collection procedures, and provide a description of the data and a discussion of issues for archiving and dissemination of social media data.
△ Less
Submitted 1 April, 2014; v1 submitted 16 December, 2013;
originally announced December 2013.