-
Building Bridges between Users and Content across Multiple Platforms during Natural Disasters
Authors:
Lynnette Hui Xian Ng,
Iain J. Cruickshank,
David Farr
Abstract:
Social media is a primary medium for information diffusion during natural disasters. The social media ecosystem has been used to identify destruction, analyze opinions and organize aid. While the overall picture and aggregate trends may be important, a crucial part of the picture is the connections on these sites. These bridges are essential to facilitate information flow within the network. In th…
▽ More
Social media is a primary medium for information diffusion during natural disasters. The social media ecosystem has been used to identify destruction, analyze opinions and organize aid. While the overall picture and aggregate trends may be important, a crucial part of the picture is the connections on these sites. These bridges are essential to facilitate information flow within the network. In this work, we perform a multi-platform analysis (X, Reddit, YouTube) of Hurricanes Helene and Milton, which occurred in quick session to each other in the US in late 2024. We construct network graphs to understand the properties of effective bridging content and users. We find that bridges tend to exist on X, that bridging content is complex, and that bridging users have relatable affiliations related to gender, race and job. Public organizations can use these characteristics to manage their social media personas during natural disasters more effectively.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
DIVERSE: A Dataset of YouTube Video Comment Stances with a Data Programming Model
Authors:
Iain J. Cruickshank,
Amir Soofi,
Lynnette Hui Xian Ng
Abstract:
Public opinion of military organizations significantly influences their ability to recruit talented individuals. As recruitment efforts increasingly extend into digital spaces like social media, it becomes essential to assess the stance of social media users toward online military content. However, there is a notable lack of data for analyzing opinions on military recruiting efforts online, compou…
▽ More
Public opinion of military organizations significantly influences their ability to recruit talented individuals. As recruitment efforts increasingly extend into digital spaces like social media, it becomes essential to assess the stance of social media users toward online military content. However, there is a notable lack of data for analyzing opinions on military recruiting efforts online, compounded by challenges in stance labeling, which is crucial for understanding public perceptions. Despite the importance of stance analysis for successful online military recruitment, creating human-annotated, in-domain stance labels is resource-intensive. In this paper, we address both the challenges of stance labeling and the scarcity of data on public opinions of online military recruitment by introducing and releasing the DIVERSE dataset: https://doi.org/10.5281/zenodo.10493803. This dataset comprises all comments from the U.S. Army's official YouTube Channel videos. We employed a state-of-the-art weak supervision approach, leveraging large language models to label the stance of each comment toward its respective video and the U.S. Army. Our findings indicate that the U.S. Army's videos began attracting a significant number of comments post-2021, with the stance distribution generally balanced among supportive, oppositional, and neutral comments, with a slight skew towards oppositional versus supportive comments.
△ Less
Submitted 28 October, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Developing a Natural Language Understanding Model to Characterize Cable News Bias
Authors:
Seth P. Benson,
Iain J. Cruickshank
Abstract:
Media bias has been extensively studied by both social and computational sciences. However, current work still has a large reliance on human input and subjective assessment to label biases. This is especially true for cable news research. To address these issues, we develop an unsupervised machine learning method to characterize the bias of cable news programs without any human input. This method…
▽ More
Media bias has been extensively studied by both social and computational sciences. However, current work still has a large reliance on human input and subjective assessment to label biases. This is especially true for cable news research. To address these issues, we develop an unsupervised machine learning method to characterize the bias of cable news programs without any human input. This method relies on the analysis of what topics are mentioned through Named Entity Recognition and how those topics are discussed through Stance Analysis in order to cluster programs with similar biases together. Applying our method to 2020 cable news transcripts, we find that program clusters are consistent over time and roughly correspond to the cable news network of the program. This method reveals the potential for future tools to objectively assess media bias and characterize unfamiliar media environments.
△ Less
Submitted 17 October, 2023; v1 submitted 13 October, 2023;
originally announced October 2023.
-
Prompting and Fine-Tuning Open-Sourced Large Language Models for Stance Classification
Authors:
Iain J. Cruickshank,
Lynnette Hui Xian Ng
Abstract:
Stance classification, the task of predicting the viewpoint of an author on a subject of interest, has long been a focal point of research in domains ranging from social science to machine learning. Current stance detection methods rely predominantly on manual annotation of sentences, followed by training a supervised machine learning model. However, this manual annotation process requires laborio…
▽ More
Stance classification, the task of predicting the viewpoint of an author on a subject of interest, has long been a focal point of research in domains ranging from social science to machine learning. Current stance detection methods rely predominantly on manual annotation of sentences, followed by training a supervised machine learning model. However, this manual annotation process requires laborious annotation effort, and thus hampers its potential to generalize across different contexts. In this work, we investigate the use of Large Language Models (LLMs) as a stance detection methodology that can reduce or even eliminate the need for manual annotations. We investigate 10 open-source models and 7 prompting schemes, finding that LLMs are competitive with in-domain supervised models but are not necessarily consistent in their performance. We also fine-tuned the LLMs, but discovered that fine-tuning process does not necessarily lead to better performance. In general, we discover that LLMs do not routinely outperform their smaller supervised machine learning models, and thus call for stance detection to be a benchmark for which LLMs also optimize for. The code used in this study is available at \url{https://github.com/ijcruic/LLM-Stance-Labeling}
△ Less
Submitted 5 March, 2024; v1 submitted 24 September, 2023;
originally announced September 2023.
-
Analysis of Media Writing Style Bias through Text-Embedding Networks
Authors:
Iain J. Cruickshank,
Jessica Zhu,
Nathaniel D. Bastian
Abstract:
With the rise of phenomena like `fake news' and the growth of heavily-biased media ecosystems, there has been increased attention on understanding and evaluating media bias. Of particular note in the evaluation of media bias is writing style bias, which includes lexical bias and framing bias. We propose a novel approach to evaluating writing style bias that utilizes natural language similarity est…
▽ More
With the rise of phenomena like `fake news' and the growth of heavily-biased media ecosystems, there has been increased attention on understanding and evaluating media bias. Of particular note in the evaluation of media bias is writing style bias, which includes lexical bias and framing bias. We propose a novel approach to evaluating writing style bias that utilizes natural language similarity estimation and a network-based representation of the shared content between articles to perform bias characterization. Our proposed method presents a new means of evaluating writing style bias that does not rely on human experts or knowledge of a media producer's publication procedures. The results of experimentation on real-world vaccine mandate data demonstrates the utility of the technique and how the standard bias labeling procedures of only having one bias label for a media producer is insufficient to truly characterize the bias of that media producer.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Measuring Classification Decision Certainty and Doubt
Authors:
Alexander M. Berenbeim,
Iain J. Cruickshank,
Susmit Jha,
Robert H. Thomson,
Nathaniel D. Bastian
Abstract:
Quantitative characterizations and estimations of uncertainty are of fundamental importance in optimization and decision-making processes. Herein, we propose intuitive scores, which we call certainty and doubt, that can be used in both a Bayesian and frequentist framework to assess and compare the quality and uncertainty of predictions in (multi-)classification decision machine learning problems.
Quantitative characterizations and estimations of uncertainty are of fundamental importance in optimization and decision-making processes. Herein, we propose intuitive scores, which we call certainty and doubt, that can be used in both a Bayesian and frequentist framework to assess and compare the quality and uncertainty of predictions in (multi-)classification decision machine learning problems.
△ Less
Submitted 27 March, 2023; v1 submitted 25 March, 2023;
originally announced March 2023.
-
Multi-modal Networks Reveal Patterns of Operational Similarity of Terrorist Organizations
Authors:
Gian Maria Campedelli,
Iain J. Cruickshank,
Kathleen M. Carley
Abstract:
Capturing dynamics of operational similarity among terrorist groups is critical to provide actionable insights for counter-terrorism and intelligence monitoring. Yet, in spite of its theoretical and practical relevance, research addressing this problem is currently lacking. We tackle this problem proposing a novel computational framework for detecting clusters of terrorist groups sharing similar b…
▽ More
Capturing dynamics of operational similarity among terrorist groups is critical to provide actionable insights for counter-terrorism and intelligence monitoring. Yet, in spite of its theoretical and practical relevance, research addressing this problem is currently lacking. We tackle this problem proposing a novel computational framework for detecting clusters of terrorist groups sharing similar behaviors, focusing on groups' yearly repertoire of deployed tactics, attacked targets, and utilized weapons. Specifically considering those organizations that have plotted at least 50 attacks from 1997 to 2018, accounting for a total of 105 groups responsible for more than 42,000 events worldwide, we offer three sets of results. First, we show that over the years global terrorism has been characterized by increasing operational cohesiveness. Second, we highlight that year-to-year stability in co-clustering among groups has been particularly high from 2009 to 2018, indicating temporal consistency of similarity patterns in the last decade. Third, we demonstrate that operational similarity between two organizations is driven by three factors: (a) their overall activity; (b) the difference in the diversity of their operational repertoires; (c) the difference in a combined measure of diversity and activity. Groups' operational preferences, geographical homophily and ideological affinity have no consistent role in determining operational similarity.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Influencing the Influencers: Evaluating Person-to-Person Influence on Social Networks Using Granger Causality
Authors:
Richard Kuzma,
Iain J. Cruickshank,
Kathleen M. Carley
Abstract:
We introduce a novel method for analyzing person-to-person content influence on Twitter. Using an Ego-Alter framework and Granger Causality, we examine President Donald Trump (the Ego) and the people he retweets (Alters) as a case study. We find that each Alter has a different scope of influence across multiple topics, different magnitude of influence on a given topic, and the magnitude of a singl…
▽ More
We introduce a novel method for analyzing person-to-person content influence on Twitter. Using an Ego-Alter framework and Granger Causality, we examine President Donald Trump (the Ego) and the people he retweets (Alters) as a case study. We find that each Alter has a different scope of influence across multiple topics, different magnitude of influence on a given topic, and the magnitude of a single Alter's influence can vary across topics. This work is novel in its focus on person-to-person influence and content-based influence. Its impact is two-fold: (1) identifying "canaries in the coal mine" who could be observed by misinformation researchers or platforms to identify misinformation narratives before super-influencers spread them to large audiences, and (2) enabling digital marketing targeted toward upstream Alters of super-influencers.
△ Less
Submitted 10 October, 2021;
originally announced October 2021.
-
Analysis of External Content in the Vaccination Discussion on Twitter
Authors:
Richard Kuzma,
Iain J. Cruickshank,
Kathleen M. Carley
Abstract:
The spread of coronavirus and anti-vaccine conspiracies online hindered public health responses to the pandemic. We examined the content of external articles shared on Twitter from February to June 2020 to understand how conspiracy theories and fake news competed with legitimate sources of information. Examining external content--articles, rather than social media posts--is a novel methodology tha…
▽ More
The spread of coronavirus and anti-vaccine conspiracies online hindered public health responses to the pandemic. We examined the content of external articles shared on Twitter from February to June 2020 to understand how conspiracy theories and fake news competed with legitimate sources of information. Examining external content--articles, rather than social media posts--is a novel methodology that allows for non-social media specific analysis of misinformation, tracking of changing narratives over time, and determining which types of resources (government, news, scientific, or dubious) dominate the pandemic vaccine conversation. We find that distinct narratives emerge, those narratives change over time, and lack of government and scientific messaging on coronavirus created an information vacuum filled by both traditional news and conspiracy theories.
△ Less
Submitted 3 September, 2021; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Characterizing Communities of Hashtag Usage on Twitter During the 2020 COVID-19 Pandemic by Multi-view Clustering
Authors:
Iain J. Cruickshank,
Kathleen M. Carley
Abstract:
The COVID-19 pandemic has produced a flurry of online activity on social media sites. As such, analysis of social media data during the COVID-19 pandemic can produce unique insights into discussion topics and how those topics evolve over the course of the pandemic. In this study, we propose analyzing discussion topics on Twitter by clustering hashtags. In order to obtain high-quality clusters of t…
▽ More
The COVID-19 pandemic has produced a flurry of online activity on social media sites. As such, analysis of social media data during the COVID-19 pandemic can produce unique insights into discussion topics and how those topics evolve over the course of the pandemic. In this study, we propose analyzing discussion topics on Twitter by clustering hashtags. In order to obtain high-quality clusters of the Twitter hashtags, we also propose a novel multi-view clustering technique that incorporates multiple different data types that can be used to describe how users interact with hashtags. The results of our multi-view clustering show that there are distinct temporal and topical trends present within COVID-19 twitter discussion. In particular, we find that some topical clusters of hashtags shift over the course of the pandemic, while others are persistent throughout, and that there are distinct temporal trends in hashtag usage. This study is the first to use multi-view clustering to analyze hashtags and the first analysis of the greater trends of discussion occurring online during the COVID-19 pandemic.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.