-
Misleading Ourselves: How Disinformation Manipulates Sensemaking
Authors:
Stephen Prochaska,
Julie Vera,
Douglas Lew Tan,
Kate Starbird
Abstract:
Informal sensemaking surrounding U.S. election processes has been fraught in recent years, due to the inherent uncertainty of elections, the complexity of election processes in the U.S., and to disinformation. Based on insights from qualitative analysis of election rumors spreading online in 2020 and 2022, we introduce the concept of manipulated sensemaking to describe how disinformation functions…
▽ More
Informal sensemaking surrounding U.S. election processes has been fraught in recent years, due to the inherent uncertainty of elections, the complexity of election processes in the U.S., and to disinformation. Based on insights from qualitative analysis of election rumors spreading online in 2020 and 2022, we introduce the concept of manipulated sensemaking to describe how disinformation functions by disrupting online audiences ability to make sense of novel, uncertain, or ambiguous information. We describe how at the core of this disruption is the ability for disinformation to shape broad, underlying stories called deep stories which determine the frames we use to make sense of this novel information. Additionally, we explain how sensemakings orientation around plausible explanations over accurate explanations makes it vulnerable to manipulation. Lastly, we demonstrate how disinformed deep stories shape sensemaking not just for a single event, but for many events in the future.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
LLM Confidence Evaluation Measures in Zero-Shot CSS Classification
Authors:
David Farr,
Iain Cruickshank,
Nico Manzonelli,
Nicholas Clark,
Kate Starbird,
Jevin West
Abstract:
Assessing classification confidence is critical for leveraging large language models (LLMs) in automated labeling tasks, especially in the sensitive domains presented by Computational Social Science (CSS) tasks. In this paper, we make three key contributions: (1) we propose an uncertainty quantification (UQ) performance measure tailored for data annotation tasks, (2) we compare, for the first time…
▽ More
Assessing classification confidence is critical for leveraging large language models (LLMs) in automated labeling tasks, especially in the sensitive domains presented by Computational Social Science (CSS) tasks. In this paper, we make three key contributions: (1) we propose an uncertainty quantification (UQ) performance measure tailored for data annotation tasks, (2) we compare, for the first time, five different UQ strategies across three distinct LLMs and CSS data annotation tasks, (3) we introduce a novel UQ aggregation strategy that effectively identifies low-confidence LLM annotations and disproportionately uncovers data incorrectly labeled by the LLMs. Our results demonstrate that our proposed UQ aggregation strategy improves upon existing methods andcan be used to significantly improve human-in-the-loop data annotation processes.
△ Less
Submitted 1 November, 2024; v1 submitted 16 October, 2024;
originally announced October 2024.
-
LLM Chain Ensembles for Scalable and Accurate Data Annotation
Authors:
David Farr,
Nico Manzonelli,
Iain Cruickshank,
Kate Starbird,
Jevin West
Abstract:
The ability of large language models (LLMs) to perform zero-shot classification makes them viable solutions for data annotation in rapidly evolving domains where quality labeled data is often scarce and costly to obtain. However, the large-scale deployment of LLMs can be prohibitively expensive. This paper introduces an LLM chain ensemble methodology that aligns multiple LLMs in a sequence, routin…
▽ More
The ability of large language models (LLMs) to perform zero-shot classification makes them viable solutions for data annotation in rapidly evolving domains where quality labeled data is often scarce and costly to obtain. However, the large-scale deployment of LLMs can be prohibitively expensive. This paper introduces an LLM chain ensemble methodology that aligns multiple LLMs in a sequence, routing data subsets to subsequent models based on classification uncertainty. This approach leverages the strengths of individual LLMs within a broader system, allowing each model to handle data points where it exhibits the highest confidence, while forwarding more complex cases to potentially more robust models. Our results show that the chain ensemble method often exceeds the performance of the best individual model in the chain and achieves substantial cost savings, making LLM chain ensembles a practical and efficient solution for large-scale data annotation challenges.
△ Less
Submitted 1 November, 2024; v1 submitted 16 October, 2024;
originally announced October 2024.
-
ElectionRumors2022: A Dataset of Election Rumors on Twitter During the 2022 US Midterms
Authors:
Joseph S Schafer,
Kayla Duskin,
Stephen Prochaska,
Morgan Wack,
Anna Beers,
Lia Bozarth,
Taylor Agajanian,
Mike Caulfield,
Emma S Spiro,
Kate Starbird
Abstract:
Understanding the spread of online rumors is a pressing societal challenge and an active area of research across domains. In the context of the 2022 U.S. midterm elections, one influential social media platform for sharing information -- including rumors that may be false, misleading, or unsubstantiated -- was Twitter (now renamed X). To increase understanding of the dynamics of online rumors abou…
▽ More
Understanding the spread of online rumors is a pressing societal challenge and an active area of research across domains. In the context of the 2022 U.S. midterm elections, one influential social media platform for sharing information -- including rumors that may be false, misleading, or unsubstantiated -- was Twitter (now renamed X). To increase understanding of the dynamics of online rumors about elections, we present and analyze a dataset of 1.81 million Twitter posts corresponding to 135 distinct rumors which spread online during the midterm election season (September 5 to December 1, 2022). We describe how this data was collected, compiled, and supplemented, and provide a series of exploratory analyses along with comparisons to a previously-published dataset on 2020 election rumors. We also conduct a mixed-methods analysis of three distinct rumors about the election in Arizona, a particularly prominent focus of 2022 election rumoring. Finally, we provide a set of potential future directions for how this dataset could be used to facilitate future research into online rumors, misinformation, and disinformation.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Viral Privacy: Contextual Integrity as a Lens to Understand Content Creators' Privacy Perceptions and Needs After Sudden Attention
Authors:
Joseph S. Schafer,
Annie Denton,
Chloe Seelhoff,
Jordyn Vo,
Kate Starbird
Abstract:
When designing multi-stakeholder privacy systems, it is important to consider how different groups of social media users have different goals and requirements for privacy. Additionally, we must acknowledge that it is important to keep in mind that even a single creator's needs can change as their online visibility and presence shifts, and that robust multi-stakeholder privacy systems should accoun…
▽ More
When designing multi-stakeholder privacy systems, it is important to consider how different groups of social media users have different goals and requirements for privacy. Additionally, we must acknowledge that it is important to keep in mind that even a single creator's needs can change as their online visibility and presence shifts, and that robust multi-stakeholder privacy systems should account for these shifts. Using the framework of contextual integrity, we explain a theoretical basis for how to evaluate the potential changing privacy needs of users as their profiles undergo a sudden rise in online attention, and ongoing projects to understand these potential shifts in perspectives.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Towards Incorporating Researcher Safety into Information Integrity Research Ethics
Authors:
Joseph S. Schafer,
Kate Starbird
Abstract:
Traditional research ethics has mainly and rightly been focused on making sure that participants are treated safely, justly, and ethically, to avoid the violation of their rights or putting participants in harm's way. Information integrity research within CSCW has also correspondingly mainly focused on these issues, and the focus of internet research ethics has primarily focused on increasing prot…
▽ More
Traditional research ethics has mainly and rightly been focused on making sure that participants are treated safely, justly, and ethically, to avoid the violation of their rights or putting participants in harm's way. Information integrity research within CSCW has also correspondingly mainly focused on these issues, and the focus of internet research ethics has primarily focused on increasing protections of participant data. However, as branches of internet research focus on more fraught contexts such as information integrity and problematic information, more explicit consideration of other ethical frames and subjects is warranted. In this workshop paper, we argue that researcher protections should be more explicitly considered and acknowledged in these studies, and should be considered alongside more standard ethical considerations for participants and for broader society.
△ Less
Submitted 28 December, 2023; v1 submitted 14 December, 2023;
originally announced December 2023.
-
Governance Capture in a Self-Governing Community: A Qualitative Comparison of the Serbo-Croatian Wikipedias
Authors:
Zarine Kharazian,
Kate Starbird,
Benjamin Mako Hill
Abstract:
What types of governance arrangements makes some self-governed online groups more vulnerable to disinformation campaigns? To answer this question, we present a qualitative comparative analysis of the Croatian and Serbian Wikipedia editions. We do so because between at least 2011 and 2020, the Croatian language version of Wikipedia was taken over by a small group of administrators who introduced fa…
▽ More
What types of governance arrangements makes some self-governed online groups more vulnerable to disinformation campaigns? To answer this question, we present a qualitative comparative analysis of the Croatian and Serbian Wikipedia editions. We do so because between at least 2011 and 2020, the Croatian language version of Wikipedia was taken over by a small group of administrators who introduced far-right bias and outright disinformation; dissenting editorial voices were reverted, banned, and blocked. Although Serbian Wikipedia is roughly similar in size and age, shares many linguistic and cultural features, and faced similar threats, it seems to have largely avoided this fate. Based on a grounded theory analysis of interviews with members of both communities and others in cross-functional platform-level roles, we propose that the convergence of three features -- high perceived value as a target, limited early bureaucratic openness, and a preference for personalistic, informal forms of organization over formal ones -- produced a window of opportunity for governance capture on Croatian Wikipedia. Our findings illustrate that online community governing infrastructures can play a crucial role in systematic disinformation campaigns and other influence operations.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Followback Clusters, Satellite Audiences, and Bridge Nodes: Coengagement Networks for the 2020 US Election
Authors:
Andrew Beers,
Joseph S. Schafer,
Ian Kennedy,
Morgan Wack,
Emma S. Spiro,
Kate Starbird
Abstract:
The 2020 United States presidential election was, and has continued to be, the focus of pervasive and persistent mis- and disinformation spreading through our media ecosystems, including social media. This event has driven the collection and analysis of large, directed social network datasets, but such datasets can resist intuitive understanding. In such large datasets, the overwhelming number of…
▽ More
The 2020 United States presidential election was, and has continued to be, the focus of pervasive and persistent mis- and disinformation spreading through our media ecosystems, including social media. This event has driven the collection and analysis of large, directed social network datasets, but such datasets can resist intuitive understanding. In such large datasets, the overwhelming number of nodes and edges present in typical representations create visual artifacts, such as densely overlapping edges and tightly-packed formations of low-degree nodes, which obscure many features of more practical interest. We apply a method, coengagement transformations, to convert such networks of social data into tractable images. Intuitively, this approach allows for parameterized network visualizations that make shared audiences of engaged viewers salient to viewers. Using the interpretative capabilities of this method, we perform an extensive case study of the 2020 United States presidential election on Twitter, contributing an empirical analysis of coengagement. By creating and contrasting different networks at different parameter sets, we define and characterize several structures in this discourse network, including bridging accounts, satellite audiences, and followback communities. We discuss the importance and implications of these empirical network features in this context. In addition, we release open-source code for creating coengagement networks from Twitter and other structured interaction data.
△ Less
Submitted 30 May, 2023; v1 submitted 28 February, 2023;
originally announced March 2023.
-
An Agenda for Disinformation Research
Authors:
Nadya Bliss,
Elizabeth Bradley,
Joshua Garland,
Filippo Menczer,
Scott W. Ruston,
Kate Starbird,
Chris Wiggins
Abstract:
In the 21st Century information environment, adversarial actors use disinformation to manipulate public opinion. The distribution of false, misleading, or inaccurate information with the intent to deceive is an existential threat to the United States--distortion of information erodes trust in the socio-political institutions that are the fundamental fabric of democracy: legitimate news sources, sc…
▽ More
In the 21st Century information environment, adversarial actors use disinformation to manipulate public opinion. The distribution of false, misleading, or inaccurate information with the intent to deceive is an existential threat to the United States--distortion of information erodes trust in the socio-political institutions that are the fundamental fabric of democracy: legitimate news sources, scientists, experts, and even fellow citizens. As a result, it becomes difficult for society to come together within a shared reality; the common ground needed to function effectively as an economy and a nation. Computing and communication technologies have facilitated the exchange of information at unprecedented speeds and scales. This has had countless benefits to society and the economy, but it has also played a fundamental role in the rising volume, variety, and velocity of disinformation. Technological advances have created new opportunities for manipulation, influence, and deceit. They have effectively lowered the barriers to reaching large audiences, diminishing the role of traditional mass media along with the editorial oversight they provided. The digitization of information exchange, however, also makes the practices of disinformation detectable, the networks of influence discernable, and suspicious content characterizable. New tools and approaches must be developed to leverage these affordances to understand and address this growing challenge.
△ Less
Submitted 15 December, 2020;
originally announced December 2020.
-
What "Crowdsourcing" Obscures: Exposing the Dynamics of Connected Crowd Work during Disaster
Authors:
Kate Starbird
Abstract:
The aim of this paper is to demonstrate that the current understanding of crowdsourcing may not be broad enough to capture the diversity of crowd work during disasters, or specific enough to highlight the unique dynamics of information organizing by the crowd in that context. In making this argument, this paper first unpacks the crowdsourcing term, examining its roots in open source development an…
▽ More
The aim of this paper is to demonstrate that the current understanding of crowdsourcing may not be broad enough to capture the diversity of crowd work during disasters, or specific enough to highlight the unique dynamics of information organizing by the crowd in that context. In making this argument, this paper first unpacks the crowdsourcing term, examining its roots in open source development and outsourcing business models, and tying it to related concepts of human computation and collective intelligence. The paper then attempts to characterize several examples of crowd work during disasters using current definitions of crowdsourcing and existing models for human computation and collective intelligence, exposing a need for future research towards a framework for understanding crowd work.
△ Less
Submitted 15 April, 2012;
originally announced April 2012.