Search | arXiv e-print repository

Appeal and Scope of Misinformation Spread by AI Agents and Humans

Authors: Lynnette Hui Xian Ng, Wenqi Zhou, Kathleen M. Carley

Abstract: This work examines the influence of misinformation and the role of AI agents, called bots, on social network platforms. To quantify the impact of misinformation, it proposes two new metrics based on attributes of tweet engagement and user network position: Appeal, which measures the popularity of the tweet, and Scope, which measures the potential reach of the tweet. In addition, it analyzes 5.8 mi… ▽ More This work examines the influence of misinformation and the role of AI agents, called bots, on social network platforms. To quantify the impact of misinformation, it proposes two new metrics based on attributes of tweet engagement and user network position: Appeal, which measures the popularity of the tweet, and Scope, which measures the potential reach of the tweet. In addition, it analyzes 5.8 million misinformation tweets on the COVID-19 vaccine discourse over three time periods: Pre-Vaccine, Vaccine Launch, and Post-Vaccine. Results show that misinformation was more prevalent during the first two periods. Human-generated misinformation tweets tend to have higher appeal and scope compared to bot-generated ones. Tweedie regression analysis reveals that human-generated misinformation tweets were most concerning during Vaccine Launch week, whereas bot-generated misinformation reached its highest appeal and scope during the Pre-Vaccine period. △ Less

Submitted 6 May, 2025; originally announced May 2025.

Comments: Accepted to AMCIS 2025

arXiv:2504.12498 [pdf, ps, other]

The Dual Personas of Social Media Bots

Authors: Lynnette Hui Xian Ng, Kathleen M. Carley

Abstract: Social media bots are AI agents that participate in online conversations. Most studies focus on the general bot and the malicious nature of these agents. However, bots have many different personas, each specialized towards a specific behavioral or content trait. Neither are bots singularly bad, because they are used for both good and bad information dissemination. In this article, we introduce fif… ▽ More Social media bots are AI agents that participate in online conversations. Most studies focus on the general bot and the malicious nature of these agents. However, bots have many different personas, each specialized towards a specific behavioral or content trait. Neither are bots singularly bad, because they are used for both good and bad information dissemination. In this article, we introduce fifteen agent personas of social media bots. These personas have two main categories: Content-Based Bot Persona and Behavior-Based Bot Persona. We also form yardsticks of the good-bad duality of the bots, elaborating on metrics of good and bad bot agents. Our work puts forth a guideline to inform bot detection regulation, emphasizing that policies should focus on how these agents are employed, rather than collectively terming bot agents as bad. △ Less

Submitted 16 April, 2025; originally announced April 2025.

arXiv:2504.00071 [pdf, other]

Navigating Decentralized Online Social Networks: An Overview of Technical and Societal Challenges in Architectural Choices

Authors: Ujun Jeong, Lynnette Hui Xian Ng, Kathleen M. Carley, Huan Liu

Abstract: Decentralized online social networks have evolved from experimental stages to operating at unprecedented scale, with broader adoption and more active use than ever before. Platforms like Mastodon, Bluesky, Hive, and Nostr have seen notable growth, particularly following the wave of user migration after Twitter's acquisition in October 2022. As new platforms build upon earlier decentralization arch… ▽ More Decentralized online social networks have evolved from experimental stages to operating at unprecedented scale, with broader adoption and more active use than ever before. Platforms like Mastodon, Bluesky, Hive, and Nostr have seen notable growth, particularly following the wave of user migration after Twitter's acquisition in October 2022. As new platforms build upon earlier decentralization architectures and explore novel configurations, it becomes increasingly important to understand how these foundations shape both the direction and limitations of decentralization. Prior literature primarily focuses on specific architectures, resulting in fragmented views that overlook how different social networks encounter similar challenges and complement one another. This paper fills that gap by presenting a comprehensive view of the current decentralized online social network landscape. We examine four major architectures: federated, peer-to-peer, blockchain, and hybrid, tracing their evolution and evaluating how they support core social networking functions. By linking these architectural aspects to real-world cases, our work provides a foundation for understanding the societal implications of decentralized social platforms. △ Less

Submitted 31 March, 2025; originally announced April 2025.

arXiv:2502.17542 [pdf, other]

Data Voids and Warning Banners on Google Search

Authors: Ronald E. Robertson, Evan M. Williams, Kathleen M. Carley, David Thiel

Abstract: The content moderation systems used by social media sites are a topic of widespread interest and research, but less is known about the use of similar systems by web search engines. For example, Google Search attempts to help its users navigate three distinct types of data voids--when the available search results are deemed low-quality, low-relevance, or rapidly-changing--by placing one of three co… ▽ More The content moderation systems used by social media sites are a topic of widespread interest and research, but less is known about the use of similar systems by web search engines. For example, Google Search attempts to help its users navigate three distinct types of data voids--when the available search results are deemed low-quality, low-relevance, or rapidly-changing--by placing one of three corresponding warning banners at the top of the search page. Here we collected 1.4M unique search queries shared on social media to surface Google's warning banners, examine when and why those banners were applied, and train deep learning models to identify data voids beyond Google's classifications. Across three data collection waves (Oct 2023, Mar 2024, Sept 2024), we found that Google returned a warning banner for about 1% of our search queries, with substantial churn in the set of queries that received a banner across waves. The low-quality banners, which warn users that their results "may not have reliable information on this topic," were especially rare, and their presence was associated with low-quality domains in the search results and conspiracy-related keywords in the search query. Low-quality banner presence was also inconsistent over short time spans, even when returning highly similar search results. In August 2024, low-quality banners stopped appearing on the SERPs we collected, but average search result quality remained largely unchanged, suggesting they may have been discontinued by Google. Using our deep learning models to analyze both queries and search results in context, we identify 29 to 58 times more low-quality data voids than there were low-quality banners, and find a similar number after the banners had disappeared. Our findings point to the need for greater transparency on search engines' content moderation practices, especially around important events like elections. △ Less

Submitted 24 February, 2025; originally announced February 2025.

arXiv:2502.14908 [pdf, other]

SegSub: Evaluating Robustness to Knowledge Conflicts and Hallucinations in Vision-Language Models

Authors: Peter Carragher, Nikitha Rao, Abhinand Jha, R Raghav, Kathleen M. Carley

Abstract: Vision language models (VLM) demonstrate sophisticated multimodal reasoning yet are prone to hallucination when confronted with knowledge conflicts, impeding their deployment in information-sensitive contexts. While existing research addresses robustness in unimodal models, the multimodal domain lacks systematic investigation of cross-modal knowledge conflicts. This research introduces \segsub, a… ▽ More Vision language models (VLM) demonstrate sophisticated multimodal reasoning yet are prone to hallucination when confronted with knowledge conflicts, impeding their deployment in information-sensitive contexts. While existing research addresses robustness in unimodal models, the multimodal domain lacks systematic investigation of cross-modal knowledge conflicts. This research introduces \segsub, a framework for applying targeted image perturbations to investigate VLM resilience against knowledge conflicts. Our analysis reveals distinct vulnerability patterns: while VLMs are robust to parametric conflicts (20% adherence rates), they exhibit significant weaknesses in identifying counterfactual conditions (<30% accuracy) and resolving source conflicts (<1% accuracy). Correlations between contextual richness and hallucination rate (r = -0.368, p = 0.003) reveal the kinds of images that are likely to cause hallucinations. Through targeted fine-tuning on our benchmark dataset, we demonstrate improvements in VLM knowledge conflict detection, establishing a foundation for developing hallucination-resilient multimodal systems in information-sensitive environments. △ Less

Submitted 9 May, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

arXiv:2502.13836 [pdf, ps, other]

Quantifying Memorization and Parametric Response Rates in Retrieval-Augmented Vision-Language Models

Authors: Peter Carragher, Abhinand Jha, R Raghav, Kathleen M. Carley

Abstract: Large Language Models (LLMs) demonstrate remarkable capabilities in question answering (QA), but metrics for assessing their reliance on memorization versus retrieval remain underdeveloped. Moreover, while finetuned models are state-of-the-art on closed-domain tasks, general-purpose models like GPT-4o exhibit strong zero-shot performance. This raises questions about the trade-offs between memoriza… ▽ More Large Language Models (LLMs) demonstrate remarkable capabilities in question answering (QA), but metrics for assessing their reliance on memorization versus retrieval remain underdeveloped. Moreover, while finetuned models are state-of-the-art on closed-domain tasks, general-purpose models like GPT-4o exhibit strong zero-shot performance. This raises questions about the trade-offs between memorization, generalization, and retrieval. In this work, we analyze the extent to which multimodal retrieval-augmented VLMs memorize training data compared to baseline VLMs. Using the WebQA benchmark, we contrast finetuned models with baseline VLMs on multihop retrieval and question answering, examining the impact of finetuning on data memorization. To quantify memorization in end-to-end retrieval and QA systems, we propose several proxy metrics by investigating instances where QA succeeds despite retrieval failing. In line with existing work, we find that finetuned models rely more heavily on memorization than retrieval-augmented VLMs, and achieve higher accuracy as a result (72% vs 52% on WebQA test set). Finally, we present the first empirical comparison of the parametric effect between text and visual modalities. Here, we find that image-based questions have parametric response rates that are consistently 15-25% higher than for text-based questions in the WebQA dataset. As such, our measures pose a challenge for future work, both to account for differences in model memorization across different modalities and more generally to reconcile memorization and generalization in joint Retrieval-QA tasks. △ Less

Submitted 15 June, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

arXiv:2501.18839 [pdf, other]

Social Cyber Geographical Worldwide Inventory of Bots

Authors: Lynnette Hui Xian Ng, Kathleen M. Carley

Abstract: Social Cyber Geography is the space in the digital cyber realm that is produced through social relations. Communication in the social media ecosystem happens not only because of human interactions, but is also fueled by algorithmically controlled bot agents. Most studies have not looked at the social cyber geography of bots because they focus on bot activity within a single country. Since creating… ▽ More Social Cyber Geography is the space in the digital cyber realm that is produced through social relations. Communication in the social media ecosystem happens not only because of human interactions, but is also fueled by algorithmically controlled bot agents. Most studies have not looked at the social cyber geography of bots because they focus on bot activity within a single country. Since creating a bot uses universal programming technology, bots, how prevalent are these bots throughout the world? To quantify bot activity worldwide, we perform a multilingual and geospatial analysis on a large dataset of social data collected from X during the Coronavirus pandemic in 2021. This pandemic affected most of the world, and thus is a common topic of discussion. Our dataset consists of ~100 mil posts generated by ~31mil users. Most bot studies focus only on English-speaking countries, because most bot detection algorithms are built for the English language. However, only 47\% of the bots write in the English language. To accommodate multiple languages in our bot detection algorithm, we built Multilingual BotBuster, a multi-language bot detection algorithm to identify the bots in this diverse dataset. We also create a Geographical Location Identifier to swiftly identify the countries a user affiliates with in his description. Our results show that bots can appear to move from one country to another, but the language they write in remains relatively constant. Bots distribute narratives on distinct topics related to their self-declared country affiliation. Finally, despite the diverse distribution of bot locations around the world, the proportion of bots per country is about 20%. Our work stresses the importance of a united analysis of the cyber and physical realms, where we combine both spheres to inventorize the language and location of social media bots and understand communication strategies. △ Less

Submitted 30 January, 2025; originally announced January 2025.

arXiv:2501.00855 [pdf, other]

What is a Social Media Bot? A Global Comparison of Bot and Human Characteristics

Authors: Lynnette Hui Xian Ng, Kathleen M. Carley

Abstract: Chatter on social media is 20% bots and 80% humans. Chatter by bots and humans is consistently different: bots tend to use linguistic cues that can be easily automated while humans use cues that require dialogue understanding. Bots use words that match the identities they choose to present, while humans may send messages that are not related to the identities they present. Bots and humans differ i… ▽ More Chatter on social media is 20% bots and 80% humans. Chatter by bots and humans is consistently different: bots tend to use linguistic cues that can be easily automated while humans use cues that require dialogue understanding. Bots use words that match the identities they choose to present, while humans may send messages that are not related to the identities they present. Bots and humans differ in their communication structure: sampled bots have a star interaction structure, while sampled humans have a hierarchical structure. These conclusions are based on a large-scale analysis of social media tweets across ~200mil users across 7 events. Social media bots took the world by storm when social-cybersecurity researchers realized that social media users not only consisted of humans but also of artificial agents called bots. These bots wreck havoc online by spreading disinformation and manipulating narratives. Most research on bots are based on special-purposed definitions, mostly predicated on the event studied. This article first begins by asking, "What is a bot?", and we study the underlying principles of how bots are different from humans. We develop a first-principle definition of a social media bot. With this definition as a premise, we systematically compare characteristics between bots and humans across global events, and reflect on how the software-programmed bot is an Artificial Intelligent algorithm, and its potential for evolution as technology advances. Based on our results, we provide recommendations for the use and regulation of bots. Finally, we discuss open challenges and future directions: Detect, to systematically identify these automated and potentially evolving bots; Differentiate, to evaluate the goodness of the bot in terms of their content postings and relationship interactions; Disrupt, to moderate the impact of malicious bots. △ Less

Submitted 25 February, 2025; v1 submitted 1 January, 2025; originally announced January 2025.

arXiv:2407.19406 [pdf, other]

Moral and emotional influences on attitude stability towards COVID-19 vaccines on social media

Authors: Samantha C. Phillips, Lynnette Hui Xian Ng, Wenqi Zhou, Kathleen M. Carley

Abstract: Effective public health messaging benefits from understanding antecedents to unstable attitudes that are more likely to be influenced. This work investigates the relationship between moral and emotional bases for attitudes towards COVID-19 vaccines and variance in stance. Evaluating nearly 1 million X users over a two month period, we find that emotional language in tweets about COVID-19 vaccines… ▽ More Effective public health messaging benefits from understanding antecedents to unstable attitudes that are more likely to be influenced. This work investigates the relationship between moral and emotional bases for attitudes towards COVID-19 vaccines and variance in stance. Evaluating nearly 1 million X users over a two month period, we find that emotional language in tweets about COVID-19 vaccines is largely associated with more variation in stance of the posting user, except anger and surprise. The strength of COVID-19 vaccine attitudes associated with moral values varies across foundations. Most notably, liberty is consistently used by users with no or less variation in stance, while fairness and sanctity are used by users with more variation. Our work has implications for designing constructive pro-vaccine messaging and identifying receptive audiences. △ Less

Submitted 28 July, 2024; originally announced July 2024.

Comments: Accepted to SBP-Brims 2024

arXiv:2406.11423 [pdf, ps, other]

Bridging Social Media and Search Engines: Dredge Words and the Detection of Unreliable Domains

Authors: Evan M. Williams, Peter Carragher, Kathleen M. Carley

Abstract: Proactive content moderation requires platforms to rapidly and continuously evaluate the credibility of websites. Leveraging the direct and indirect paths users follow to unreliable websites, we develop a website credibility classification and discovery system that integrates both webgraph and large-scale social media contexts. We additionally introduce the concept of dredge words, terms or phrase… ▽ More Proactive content moderation requires platforms to rapidly and continuously evaluate the credibility of websites. Leveraging the direct and indirect paths users follow to unreliable websites, we develop a website credibility classification and discovery system that integrates both webgraph and large-scale social media contexts. We additionally introduce the concept of dredge words, terms or phrases for which unreliable domains rank highly on search engines, and provide the first exploration of their usage on social media. Our graph neural networks that combine webgraph and social media contexts generate to state-of-the-art results in website credibility classification and significantly improves the top-k identification of unreliable domains. Additionally, we release a novel dataset of dredge words, highlighting their strong connections to both social media and online commerce platforms. △ Less

Submitted 17 June, 2025; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.07293 [pdf, other]

Exploring Cognitive Bias Triggers in COVID-19 Misinformation Tweets: A Bot vs. Human Perspective

Authors: Lynnette Hui Xian Ng, Wenqi Zhou, Kathleen M. Carley

Abstract: During the COVID-19 pandemic, the proliferation of misinformation on social media has been rapidly increasing. Automated Bot authors are believed to be significant contributors of this surge. It is hypothesized that Bot authors deliberately craft online misinformation aimed at triggering and exploiting human cognitive biases, thereby enhancing tweet engagement and persuasive influence. This study… ▽ More During the COVID-19 pandemic, the proliferation of misinformation on social media has been rapidly increasing. Automated Bot authors are believed to be significant contributors of this surge. It is hypothesized that Bot authors deliberately craft online misinformation aimed at triggering and exploiting human cognitive biases, thereby enhancing tweet engagement and persuasive influence. This study investigates this hypothesis by studying triggers of biases embedded in Bot-authored misinformation and comparing them with their counterparts, Human-authored misinformation. We complied a Misinfo Dataset that contains COVID-19 vaccine-related misinformation tweets annotated by author identities, Bots vs Humans, from Twitter during the vaccination period from July 2020 to July 2021. We developed an algorithm to computationally automate the extraction of triggers for eight cognitive biase. Our analysis revealed that the Availability Bias, Cognitive Dissonance, and Confirmation Bias were most commonly present in misinformation, with Bot-authored tweets exhibiting a greater prevalence, with distinct patterns in utilizing bias triggers between Humans and Bots. We further linked these bias triggers with engagement metrics, inferring their potential influence on tweet engagement and persuasiveness. Overall, our findings indicate that bias-triggering tactics have been more influential on Bot-authored tweets than Human-authored tweets. While certain bias triggers boosted engagement for Bot-authored tweets, some other bias triggers unexpectedly decreased it. Conversely, triggers of most biases appeared to be unrelated to the engagement of Human-authored tweets. Our work sheds light on the differential utilization and effect of persuasion strategies between Bot-authored and Human-authored misinformation from the lens of human biases, offering insights for the development of effective counter-measures. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.05246 [pdf, other]

doi 10.36190/2024.09

Blended Bots: Infiltration through Identity Deception on Social Media

Authors: Samantha C. Phillips, Lynnette Hui Xian Ng, Kathleen M. Carley

Abstract: Bots are automated social media users that can be used to amplify (mis)information and sow harmful discourse. In order to effectively influence users, bots can be generated to reproduce human user behavior. Indeed, people tend to trust information coming from users with profiles that fit roles they expect to exist, such as users with gender role stereotypes. In this work, we examine differences in… ▽ More Bots are automated social media users that can be used to amplify (mis)information and sow harmful discourse. In order to effectively influence users, bots can be generated to reproduce human user behavior. Indeed, people tend to trust information coming from users with profiles that fit roles they expect to exist, such as users with gender role stereotypes. In this work, we examine differences in the types of identities in profiles of human and bot accounts with a focus on combinations of identities that represent gender role stereotypes. We find that some types of identities differentiate between human and bot profiles, confirming this approach can be a useful in distinguishing between human and bot accounts on social media. However, contrary to our expectations, we reveal that gender bias is expressed more in human accounts than bots overall. Despite having less gender bias overall, we provide examples of identities with strong associations with gender identities in bot profiles, such as those related to technology, finance, sports, and horoscopes. Finally, we discuss implications for designing constructive social media bot detection training materials. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 19 pages, 3 figures

arXiv:2405.06634 [pdf, other]

Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark

Authors: Evan M. Williams, Kathleen M. Carley

Abstract: We evaluate the zero-shot ability of GPT-4 and LLaVa to perform simple Visual Network Analysis (VNA) tasks on small-scale graphs. We evaluate the Vision Language Models (VLMs) on 5 tasks related to three foundational network science concepts: identifying nodes of maximal degree on a rendered graph, identifying whether signed triads are balanced or unbalanced, and counting components. The tasks are… ▽ More We evaluate the zero-shot ability of GPT-4 and LLaVa to perform simple Visual Network Analysis (VNA) tasks on small-scale graphs. We evaluate the Vision Language Models (VLMs) on 5 tasks related to three foundational network science concepts: identifying nodes of maximal degree on a rendered graph, identifying whether signed triads are balanced or unbalanced, and counting components. The tasks are structured to be easy for a human who understands the underlying graph theoretic concepts, and can all be solved by counting the appropriate elements in graphs. We find that while GPT-4 consistently outperforms LLaVa, both models struggle with every visual network analysis task we propose. We publicly release the first benchmark for the evaluation of VLMs on foundational VNA tasks. △ Less

Submitted 10 June, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

Comments: 11 pages, 3 figures

arXiv:2404.15509 [pdf, other]

SMI-5: Five Dimensions of Social Media Interaction for Platform (De)Centralization

Authors: Lynnette Hui Xian Ng, Samantha C. Phillips, Kathleen M. Carley

Abstract: Web 3.0 focuses on the decentralization of the internet and creating a system of interconnected and independent computers for improved privacy and security. We extend the idea of the decentralization of the web to the social media space: whereby we ask: in the context of the social media space, what does "decentralization" mean? Does decentralization of social media affect user interactions? We pu… ▽ More Web 3.0 focuses on the decentralization of the internet and creating a system of interconnected and independent computers for improved privacy and security. We extend the idea of the decentralization of the web to the social media space: whereby we ask: in the context of the social media space, what does "decentralization" mean? Does decentralization of social media affect user interactions? We put forth the notion that decentralization in the social media does not solely take place on the physical network level, but can be compartmentalized across the entire social media stack. This paper puts forth SMI-5: the five dimensions of social media interaction for describing the (de)centralization of social platforms. We then illustrate a case study that the user interactions differ based on the slices of the SMI layer analyzed, highlighting the importance of understanding the (de)centralization of social media platforms from an a more encompassing perspective rather than only the physical network. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 6 pages, 2 figures

arXiv:2404.08869 [pdf, other]

Misinformation Resilient Search Rankings with Webgraph-based Interventions

Authors: Peter Carragher, Evan M. Williams, Kathleen M. Carley

Abstract: The proliferation of unreliable news domains on the internet has had wide-reaching negative impacts on society. We introduce and evaluate interventions aimed at reducing traffic to unreliable news domains from search engines while maintaining traffic to reliable domains. We build these interventions on the principles of fairness (penalize sites for what is in their control), generality (label/fact… ▽ More The proliferation of unreliable news domains on the internet has had wide-reaching negative impacts on society. We introduce and evaluate interventions aimed at reducing traffic to unreliable news domains from search engines while maintaining traffic to reliable domains. We build these interventions on the principles of fairness (penalize sites for what is in their control), generality (label/fact-check agnostic), targeted (increase the cost of adversarial behavior), and scalability (works at webscale). We refine our methods on small-scale webdata as a testbed and then generalize the interventions to a large-scale webgraph containing 93.9M domains and 1.6B edges. We demonstrate that our methods penalize unreliable domains far more than reliable domains in both settings and we explore multiple avenues to mitigate unintended effects on both the small-scale and large-scale webgraph experiments. These results indicate the potential of our approach to reduce the spread of misinformation and foster a more reliable online information ecosystem. This research contributes to the development of targeted strategies to enhance the trustworthiness and quality of search engine results, ultimately benefiting users and the broader digital community. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.02338 [pdf, other]

Why do people think liberals drink lattes? How social media afforded self-presentation can shape subjective social sorting

Authors: Samantha C. Phillips, Kathleen M. Carley, Kenneth Joseph

Abstract: Social sorting, the alignment of social identities, affiliations, and/or preferences with partisan groups, can increase in-party attachment and decrease out-party tolerance. We propose that self-presentation afforded by social media profiles fosters subjective social sorting by shaping perceptions of alignments between non-political and political identifiers. Unlike previous work, we evaluate soci… ▽ More Social sorting, the alignment of social identities, affiliations, and/or preferences with partisan groups, can increase in-party attachment and decrease out-party tolerance. We propose that self-presentation afforded by social media profiles fosters subjective social sorting by shaping perceptions of alignments between non-political and political identifiers. Unlike previous work, we evaluate social sorting of naturally occurring, public-facing identifiers in social media profiles selected using a bottom-up approach. Using a sample of 50 million X users collected five times between 2016 and 2018, we identify users who define themselves politically and generate networks representing simultaneous co-occurrence of identifiers in profiles. We then systematically measure the alignment of non-political identifiers along political dimensions, revealing alignments that reinforce existing associations, reveal unexpected relationships, and reflect online and offline events. We find that while most identifiers bridge political divides, social sorting of identifiers along political lines is occurring to some degree in X profiles. Our results have implications for understanding the role of social media in facilitating (the perception of) polarization and polarization mitigation strategies such as bridging interventions and algorithms. △ Less

Submitted 23 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 29 pages, 2 figures

arXiv:2402.14203 [pdf, other]

An Exploratory Analysis of COVID Bot vs Human Disinformation Dissemination stemming from the Disinformation Dozen on Telegram

Authors: Lynnette Hui Xian Ng, Ian Kloo, Kathleen M. Carley

Abstract: The COVID-19 pandemic of 2021 led to a worldwide health crisis that was accompanied by an infodemic. A group of 12 social media personalities, dubbed the ``Disinformation Dozen", were identified as key in spreading disinformation regarding the COVID-19 virus, treatments, and vaccines. This study focuses on the spread of disinformation propagated by this group on Telegram, a mobile messaging and so… ▽ More The COVID-19 pandemic of 2021 led to a worldwide health crisis that was accompanied by an infodemic. A group of 12 social media personalities, dubbed the ``Disinformation Dozen", were identified as key in spreading disinformation regarding the COVID-19 virus, treatments, and vaccines. This study focuses on the spread of disinformation propagated by this group on Telegram, a mobile messaging and social media platform. After segregating users into three groups -- the Disinformation Dozen, bots, and humans --, we perform an investigation with a dataset of Telegram messages from January to June 2023, comparatively analyzing temporal, topical, and network features. We observe that the Disinformation Dozen are highly involved in the initial dissemination of disinformation but are not the main drivers of the propagation of disinformation. Bot users are extremely active in conversation threads, while human users are active propagators of information, disseminating posts between Telegram channels through the forwarding mechanism. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: Accepted at Journal of Computational Social Science

arXiv:2401.14607 [pdf, other]

Assembling a Multi-Platform Ensemble Social Bot Detector with Applications to US 2020 Elections

Authors: Lynnette Hui Xian Ng, Kathleen M. Carley

Abstract: Bots have been in the spotlight for many social media studies, for they have been observed to be participating in the manipulation of information and opinions on social media. These studies analyzed the activity and influence of bots in a variety of contexts: elections, protests, health communication and so forth. Prior to this analyses is the identification of bot accounts to segregate the class… ▽ More Bots have been in the spotlight for many social media studies, for they have been observed to be participating in the manipulation of information and opinions on social media. These studies analyzed the activity and influence of bots in a variety of contexts: elections, protests, health communication and so forth. Prior to this analyses is the identification of bot accounts to segregate the class of social media users. In this work, we propose an ensemble method for bot detection, designing a multi-platform bot detection architecture to handle several problems along the bot detection pipeline: incomplete data input, minimal feature engineering, optimized classifiers for each data field, and also eliminate the need for a threshold value for classification determination. With these design decisions, we generalize our bot detection framework across Twitter, Reddit and Instagram. We also perform feature importance analysis, observing that the entropy of names and number of interactions (retweets/shares) are important factors in bot determination. Finally, we apply our multi-platform bot detector to the US 2020 presidential elections to identify and analyze bot activity across multiple social media platforms, showcasing the difference in online discourse of bots from different platforms. △ Less

Submitted 1 April, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: Accepted at Social Network Analysis and Mining

arXiv:2401.06582 [pdf, other]

doi 10.1177/20539517241231275

Cyborgs for strategic communication on social media

Authors: Lynnette Hui Xian Ng, Dawn C. Robertson, Kathleen M. Carley

Abstract: Social media platforms are a key ground of information consumption and dissemination. Key figures like politicians, celebrities and activists have leveraged on its wide user base for strategic communication. Strategic communications, or StratCom, is the deliberate act of information creation and distribution. Its techniques are used by these key figures for establishing their brand and amplifying… ▽ More Social media platforms are a key ground of information consumption and dissemination. Key figures like politicians, celebrities and activists have leveraged on its wide user base for strategic communication. Strategic communications, or StratCom, is the deliberate act of information creation and distribution. Its techniques are used by these key figures for establishing their brand and amplifying their messages. Automated scripts are used on top of personal touches to quickly and effectively perform these tasks. The combination of automation and manual online posting creates a Cyborg social media profile, which is a hybrid between bot and human. In this study, we establish a quantitative definition for a Cyborg account, which is an account that are detected as bots in one time window, and identified as humans in another. This definition makes use of frequent changes of bot classification labels and large differences in bot likelihood scores to identify Cyborgs. We perform a large-scale analysis across over 3.1 million users from Twitter collected from two key events, the 2020 Coronavirus pandemic and 2020 US Elections. We extract Cyborgs from two datasets and employ tools from network science, natural language processing and manual annotation to characterize Cyborg accounts. Our analyses identify Cyborg accounts are mostly constructed for strategic communication uses, have a strong duality in their bot/human classification and are tactically positioned in the social media network, aiding these accounts to promote their desired content. Cyborgs are also discovered to have long online lives, indicating their ability to evade bot detectors, or the graciousness of platforms to allow their operations. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: To appear in Big Data and Society

arXiv:2401.05501 [pdf, other]

doi 10.1140/epjds/s13688-023-00440-3

Deflating the Chinese Balloon: Types of Twitter Bots in US-China balloon incident

Authors: Lynnette Hui Xian Ng, Kathleen M. Carley

Abstract: As digitalization increases, countries employ digital diplomacy, harnessing digital resources to project their desired image. Digital diplomacy also encompasses the interactivity of digital platforms, providing a trove of public opinion that diplomatic agents can collect. Social media bots actively participate in political events through influencing political communication and purporting coordinat… ▽ More As digitalization increases, countries employ digital diplomacy, harnessing digital resources to project their desired image. Digital diplomacy also encompasses the interactivity of digital platforms, providing a trove of public opinion that diplomatic agents can collect. Social media bots actively participate in political events through influencing political communication and purporting coordinated narratives to influence human behavior. This article provides a methodology towards identifying three types of bots: General Bots, News Bots and Bridging Bots, then further identify these classes of bots on Twitter during a diplomatic incident involving the United States and China. Using a series of computational methods, this article examines the impact of bots on the topics disseminated, the influence and the use of information maneuvers of bots within the social communication network. Among others, our results observe that all three types of bots are present across the two countries; bots geotagged to the US are generally concerned with the balloon location while those geotagged to China discussed topics related to escalating tensions; and perform different extent of positive narrative and network information maneuvers. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Journal ref: EPJ Data Sci. 12, 63 (2023)

arXiv:2401.02379 [pdf, other]

doi 10.1609/icwsm.v18i1.31309

Detection and Discovery of Misinformation Sources using Attributed Webgraphs

Authors: Peter Carragher, Evan M. Williams, Kathleen M. Carley

Abstract: Website reliability labels underpin almost all research in misinformation detection. However, misinformation sources often exhibit transient behavior, which makes many such labeled lists obsolete over time. We demonstrate that Search Engine Optimization (SEO) attributes provide strong signals for predicting news site reliability. We introduce a novel attributed webgraph dataset with labeled news d… ▽ More Website reliability labels underpin almost all research in misinformation detection. However, misinformation sources often exhibit transient behavior, which makes many such labeled lists obsolete over time. We demonstrate that Search Engine Optimization (SEO) attributes provide strong signals for predicting news site reliability. We introduce a novel attributed webgraph dataset with labeled news domains and their connections to outlinking and backlinking domains. We demonstrate the success of graph neural networks in detecting news site reliability using these attributed webgraphs, and show that our baseline news site reliability classifier outperforms current SoTA methods on the PoliticalNews dataset, achieving an F1 score of 0.96. Finally, we introduce and evaluate a novel graph-based algorithm for discovering previously unknown misinformation news sources. △ Less

Submitted 26 March, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

arXiv:2312.07613 [pdf, other]

Comparison of Online Maneuvers by Authentic and Inauthentic Local News Organizations

Authors: Christine Sowa Lepird, Kathleen M. Carley

Abstract: Inauthentic local news organizations, otherwise known as pink slime, have become a serious problem exploiting the trust of local news since their creation ahead of the 2020 U.S. Presidential election. In this paper, we apply the BEND framework, a methodology of classifying social media posts as belonging to sixteen network and narrative maneuvers, to compare and contrast how pink slime sites and a… ▽ More Inauthentic local news organizations, otherwise known as pink slime, have become a serious problem exploiting the trust of local news since their creation ahead of the 2020 U.S. Presidential election. In this paper, we apply the BEND framework, a methodology of classifying social media posts as belonging to sixteen network and narrative maneuvers, to compare and contrast how pink slime sites and authentic local news sites are shared on Facebook Pages. It finds that pink slime sites implemented more positive narrative maneuvers than those of local news sharers. Both news types utilized distraction but to fulfill separate goals - pink slime used it against local and state elections while authentic local news focused on national elections and figureheads. Furthermore, local news employed the neutralize tactic in order to reduce positive sentiment around national politicians. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2311.08429 [pdf, other]

doi 10.1109/WSC60868.2023.10407855

Purpose in the Machine: Do Traffic Simulators Produce Distributionally Equivalent Outcomes for Reinforcement Learning Applications?

Authors: Rex Chen, Kathleen M. Carley, Fei Fang, Norman Sadeh

Abstract: Traffic simulators are used to generate data for learning in intelligent transportation systems (ITSs). A key question is to what extent their modelling assumptions affect the capabilities of ITSs to adapt to various scenarios when deployed in the real world. This work focuses on two simulators commonly used to train reinforcement learning (RL) agents for traffic applications, CityFlow and SUMO. A… ▽ More Traffic simulators are used to generate data for learning in intelligent transportation systems (ITSs). A key question is to what extent their modelling assumptions affect the capabilities of ITSs to adapt to various scenarios when deployed in the real world. This work focuses on two simulators commonly used to train reinforcement learning (RL) agents for traffic applications, CityFlow and SUMO. A controlled virtual experiment varying driver behavior and simulation scale finds evidence against distributional equivalence in RL-relevant measures from these simulators, with the root mean squared error and KL divergence being significantly greater than 0 for all assessed measures. While granular real-world validation generally remains infeasible, these findings suggest that traffic simulators are not a deus ex machina for RL training: understanding the impacts of inter-simulator differences is necessary to train and deploy RL-based ITSs. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 12 pages; accepted version, published at the 2023 Winter Simulation Conference (WSC '23)

arXiv:2310.10851 [pdf, other]

doi 10.1007/978-3-031-43129-6_12

Tracking China's cross-strait bot networks against Taiwan

Authors: Charity S. Jacobs, Lynnette Hui Xian Ng, Kathleen M. Carley

Abstract: The cross-strait relationship between China and Taiwan is marked by increasing hostility around potential reunification. We analyze an unattributed bot network and how repeater bots engaged in an influence campaign against Taiwan following US House Speaker Nancy Pelosi's visit to Taiwan in 2022. We examine the message amplification tactics employed by four key bot sub-communities, the widespread d… ▽ More The cross-strait relationship between China and Taiwan is marked by increasing hostility around potential reunification. We analyze an unattributed bot network and how repeater bots engaged in an influence campaign against Taiwan following US House Speaker Nancy Pelosi's visit to Taiwan in 2022. We examine the message amplification tactics employed by four key bot sub-communities, the widespread dissemination of information across multiple platforms through URLs, and the potential targeted audiences of this bot network. We find that URL link sharing reveals circumvention around YouTube suspensions, in addition to the potential effectiveness of algorithmic bot connectivity to appear less bot-like, and detail a sequence of coordination within a sub-community for message amplification. We additionally find the narratives and targeted audience potentially shifting after account activity discrepancies, demonstrating how dynamic these bot networks can operate. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 10 pages with 5 figures. Published in Conference Proceedings for Social, Cultural, and Behavioral Modeling (SBP-BRiMS 2023)

arXiv:2307.08511 [pdf, other]

doi 10.1007/978-3-031-43129-6_16

Simulation of Stance Perturbations

Authors: Peter Carragher, Lynnette Hui Xian Ng, Kathleen M. Carley

Abstract: In this work, we analyze the circumstances under which social influence operations are likely to succeed. These circumstances include the selection of Confederate agents to execute intentional perturbations and the selection of Perturbation strategies. We use Agent-Based Modelling (ABM) as a simulation technique to observe the effect of intentional stance perturbations on scale-free networks. We d… ▽ More In this work, we analyze the circumstances under which social influence operations are likely to succeed. These circumstances include the selection of Confederate agents to execute intentional perturbations and the selection of Perturbation strategies. We use Agent-Based Modelling (ABM) as a simulation technique to observe the effect of intentional stance perturbations on scale-free networks. We develop a co-evolutionary social influence model to interrogate the tradeoff between perturbing stance and maintaining influence when these variables are linked through homophily. In our experiments, we observe that stances in a network will converge in sufficient simulation timesteps, influential agents are the best Confederates and the optimal Perturbation strategy involves the cascade of local ego networks. Finally, our experimental results support the theory of tipping points and are in line with empirical findings suggesting that 20-25% of agents need to be Confederates before a change in consensus can be achieved. △ Less

Submitted 17 July, 2023; originally announced July 2023.

arXiv:2306.15745 [pdf, other]

Identity Construction in a Misogynist Incels Forum

Authors: Michael Miller Yoder, Chloe Perry, David West Brown, Kathleen M. Carley, Meredith L. Pruden

Abstract: Online communities of involuntary celibates (incels) are a prominent source of misogynist hate speech. In this paper, we use quantitative text and network analysis approaches to examine how identity groups are discussed on incels-dot-is, the largest black-pilled incels forum. We find that this community produces a wide range of novel identity terms and, while terms for women are most common, menti… ▽ More Online communities of involuntary celibates (incels) are a prominent source of misogynist hate speech. In this paper, we use quantitative text and network analysis approaches to examine how identity groups are discussed on incels-dot-is, the largest black-pilled incels forum. We find that this community produces a wide range of novel identity terms and, while terms for women are most common, mentions of other minoritized identities are increasing. An analysis of the associations made with identity groups suggests an essentialist ideology where physical appearance, as well as gender and racial hierarchies, determine human value. We discuss implications for research into automated misogynist hate speech detection. △ Less

Submitted 9 July, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: Workshop on Online Abuse and Harms (WOAH) 2023; Minor edits to author names and abstracts in most recent version

arXiv:2306.15732 [pdf, other]

A Weakly Supervised Classifier and Dataset of White Supremacist Language

Authors: Michael Miller Yoder, Ahmad Diab, David West Brown, Kathleen M. Carley

Abstract: We present a dataset and classifier for detecting the language of white supremacist extremism, a growing issue in online hate speech. Our weakly supervised classifier is trained on large datasets of text from explicitly white supremacist domains paired with neutral and anti-racist data from similar domains. We demonstrate that this approach improves generalization performance to new domains. Incor… ▽ More We present a dataset and classifier for detecting the language of white supremacist extremism, a growing issue in online hate speech. Our weakly supervised classifier is trained on large datasets of text from explicitly white supremacist domains paired with neutral and anti-racist data from similar domains. We demonstrate that this approach improves generalization performance to new domains. Incorporating anti-racist texts as counterexamples to white supremacist language mitigates bias. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: ACL 2023 short

arXiv:2302.10172 [pdf]

Identity-Based Attribute Prototypes Distinguish Communities on Twitter

Authors: Thomas Magelinski, Kathleen M. Carley

Abstract: This paper examines the link between conversational communities on Twitter and their members' expressions of social identity. It specifically tests the presence of community prototypes, or collections of attributes which define a group through meta-contrast: high in-group cohesiveness and high out-group distinctiveness. Analyzing four datasets of political discussions ranging from roughly 4 to 30… ▽ More This paper examines the link between conversational communities on Twitter and their members' expressions of social identity. It specifically tests the presence of community prototypes, or collections of attributes which define a group through meta-contrast: high in-group cohesiveness and high out-group distinctiveness. Analyzing four datasets of political discussions ranging from roughly 4 to 30 million tweets, we find strong evidence for the presence of distinctive community prototypes. We observe that community prototypes are constructed through hashtags, mentions, emojis, and identity-phrases. This finding situates prior work on the identity signaling of individual users within a larger group process playing out within communication communities. Community prototypes are then constructed for specific communities by measuring the salience of identity signals for each community. Observed community prototypes tend to be based on political ideology, location and language, or general interests. While the presence of community prototypes may be a natural group behavior, the high levels of contrast observed between communities displaying ideologically opposed prototypes indicate the presence of identity-related polarization. △ Less

Submitted 20 February, 2023; originally announced February 2023.

arXiv:2212.13221 [pdf, other]

A Combined Synchronization Index for Grassroots Activism on Social Media

Authors: Lynnette Hui Xian Ng, Kathleen M. Carley

Abstract: Social media has provided a citizen voice, giving rise to grassroots collective action, where users deploy a concerted effort to disseminate online narratives and even carry out offline protests. Sometimes these collective action are aided by inorganic synchronization, which arise from bot actors. It is thus important to identify the synchronicity of emerging discourse on social media and the indi… ▽ More Social media has provided a citizen voice, giving rise to grassroots collective action, where users deploy a concerted effort to disseminate online narratives and even carry out offline protests. Sometimes these collective action are aided by inorganic synchronization, which arise from bot actors. It is thus important to identify the synchronicity of emerging discourse on social media and the indications of organic/inorganic activity within the conversations. This provides a way of profiling an event for possibility of offline protests and violence. In this study, we build on past definitions of synchronous activity on social media -- simultaneous user action -- and develop a Combined Synchronization Index (CSI) which adopts a hierarchical approach in measuring user synchronicity. We apply this index on six political and social activism events on Twitter and analyzed three action types: synchronicity by hashtag, URL and @mentions.The CSI provides an overall quantification of synchronization across all action types within an event, which allows ranking of a spectrum of synchronicity across the six events. Human users have higher synchronous scores than bot users in most events; and bots and humans exhibits the most synchronized activities across all events as compared to other pairs (i.e., bot-bot and human-human). We further rely on the harmony and dissonance of CSI-Network scores with network centrality metrics to observe the presence of organic/inorganic synchronization. We hope this work aids in investigating synchronized action within social media in a collective manner. △ Less

Submitted 26 December, 2022; originally announced December 2022.

arXiv:2210.10839 [pdf, other]

How Hate Speech Varies by Target Identity: A Computational Analysis

Authors: Michael Miller Yoder, Lynnette Hui Xian Ng, David West Brown, Kathleen M. Carley

Abstract: This paper investigates how hate speech varies in systematic ways according to the identities it targets. Across multiple hate speech datasets annotated for targeted identities, we find that classifiers trained on hate speech targeting specific identity groups struggle to generalize to other targeted identities. This provides empirical evidence for differences in hate speech by target identity; we… ▽ More This paper investigates how hate speech varies in systematic ways according to the identities it targets. Across multiple hate speech datasets annotated for targeted identities, we find that classifiers trained on hate speech targeting specific identity groups struggle to generalize to other targeted identities. This provides empirical evidence for differences in hate speech by target identity; we then investigate which patterns structure this variation. We find that the targeted demographic category (e.g. gender/sexuality or race/ethnicity) appears to have a greater effect on the language of hate speech than does the relative social power of the targeted identity group. We also find that words associated with hate speech targeting specific identities often relate to stereotypes, histories of oppression, current social movements, and other social contexts specific to identities. These experiments suggest the importance of considering targeted identity, as well as the social contexts associated with these identities, in automated hate speech classification. △ Less

Submitted 7 December, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: CoNLL 2022 camera-ready + fixed minor figure error

arXiv:2207.13658 [pdf, other]

BotBuster: Multi-platform Bot Detection Using A Mixture of Experts

Authors: Lynnette Hui Xian Ng, Kathleen M. Carley

Abstract: Despite rapid development, current bot detection models still face challenges in dealing with incomplete data and cross-platform applications. In this paper, we propose BotBuster, a social bot detector built with the concept of a mixture of experts approach. Each expert is trained to analyze a portion of account information, e.g. username, and are combined to estimate the probability that the acco… ▽ More Despite rapid development, current bot detection models still face challenges in dealing with incomplete data and cross-platform applications. In this paper, we propose BotBuster, a social bot detector built with the concept of a mixture of experts approach. Each expert is trained to analyze a portion of account information, e.g. username, and are combined to estimate the probability that the account is a bot. Experiments on 10 Twitter datasets show that BotBuster outperforms popular bot-detection baselines (avg F1=73.54 vs avg F1=45.12). This is accompanied with F1=60.04 on a Reddit dataset and F1=60.92 on an external evaluation set. Further analysis shows that only 36 posts is required for a stable bot classification. Investigation shows that bot post features have changed across the years and can be difficult to differentiate from human features, making bot detection a difficult and ongoing problem. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Comments: Accepted to ICWSM 2023

arXiv:2207.13055 [pdf, other]

Contextualizing Online Conversational Networks

Authors: Thomas Magelinski, Kathleen M. Carley

Abstract: Online social connections occur within a specific conversational context. Prior work in network analysis of social media data attempts to contextualize data through filtering. We propose a method of contextualizing online conversational connections automatically and illustrate this method with Twitter data. Specifically, we detail a graph neural network model capable of representing tweets in a ve… ▽ More Online social connections occur within a specific conversational context. Prior work in network analysis of social media data attempts to contextualize data through filtering. We propose a method of contextualizing online conversational connections automatically and illustrate this method with Twitter data. Specifically, we detail a graph neural network model capable of representing tweets in a vector space based on their text, hashtags, URLs, and neighboring tweets. Once tweets are represented, clusters of tweets uncover conversational contexts. We apply our method to a dataset with 4.5 million tweets discussing the 2020 US election. We find that even filtered data contains many different conversational contexts, with users engaging in multiple contexts. Central users in the contextualized networks differ significantly from central users in the overall network. This result implies that standard network analysis on social media data can be unreliable in the face of multiple conversational contexts. We further demonstrate that dynamic analysis of conversational contexts gives a qualitative understanding of conversational flow. △ Less

Submitted 26 July, 2022; originally announced July 2022.

Comments: To appear in ICWSM'23

arXiv:2207.07937 [pdf, other]

From Curious Hashtags to Polarized Effect: Profiling Coordinated Actions in Indonesian Twitter Discourse

Authors: Adya Danaditya, Lynnette Hui Xian Ng, Kathleen M. Carley

Abstract: Coordinated campaigns in the digital realm have become an increasingly important area of study due to their potential to cause political polarization and threats to security through real-world protests and riots. In this paper, we introduce a methodology to profile two case studies of coordinated actions in Indonesian Twitter discourse. Combining network and narrative analysis techniques, this six… ▽ More Coordinated campaigns in the digital realm have become an increasingly important area of study due to their potential to cause political polarization and threats to security through real-world protests and riots. In this paper, we introduce a methodology to profile two case studies of coordinated actions in Indonesian Twitter discourse. Combining network and narrative analysis techniques, this six-step pipeline begins with DISCOVERY of coordinated actions through hashtag-hijacking; identifying WHO are involved through the extraction of discovered agents; framing of what these actors did (DID WHAT) in terms of information manipulation maneuvers; TO WHOM these actions were targeted through correlation analysis; understanding WHY through narrative analysis and description of IMPACT through analysis of the observed conversation polarization. We describe two case studies, one international and one regional, in the Indonesian Twittersphere. Through these case studies, we unearth two seemingly related coordinated activities, discovered by deviating hashtags that do not fit the discourse, characterize the coordinated group profile and interaction, and describe the impact of their activity on the online conversation. △ Less

Submitted 16 July, 2022; originally announced July 2022.

Comments: To appear in Social Network Analysis and Mining

arXiv:2206.10495 [pdf, other]

doi 10.1145/1122445.1122456

Online Coordination: Methods and Comparative Case Studies of Coordinated Groups across Four Events in the United States

Authors: Lynnette Hui Xian Ng, Kathleen M. Carley

Abstract: Coordinated groups of user accounts working together in online social media can be used to manipulate the online discourse and thus is an important area of study. In this study, we work towards a general theory of coordination. There are many ways to coordinate groups online: semantic, social, referral and many more. Each represents a coordination dimension, where the more dimensions of coordinati… ▽ More Coordinated groups of user accounts working together in online social media can be used to manipulate the online discourse and thus is an important area of study. In this study, we work towards a general theory of coordination. There are many ways to coordinate groups online: semantic, social, referral and many more. Each represents a coordination dimension, where the more dimensions of coordination are present for one event, the stronger the coordination present. We build on existing approaches that detect coordinated groups by identifying high levels of synchronized actions within a specified time window. A key concern with this approach is the selection of the time window. We propose a method that selects the optimal window size to accurately capture local coordination while avoiding the capture of coincidental synchronicity. With this enhanced method of coordination detection, we perform a comparative study across four events: US Elections Primaries 2020, Reopen America 2020, Capitol Riots 2021 and COVID Vaccine Release 2021. Herein, we explore the following three dimensions of coordination for each event -- semantic, referral and social coordination -- and perform group and user analysis within and among the events. This allows us to expose different user coordination behavior patterns and identify narratives and user support themes, hence estimating the degree and theme of coordination. △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.03576 [pdf, other]

Coordinated through aWeb of Images: Analysis of Image-based Influence Operations from China, Iran, Russia, and Venezuela

Authors: Lynnette Hui Xian Ng, J. D. Moffitt, Kathleen M. Carley

Abstract: State-sponsored online influence operations typically consist of coordinated accounts exploiting the online space to influence public opinion. Accounts associated with these operations use images and memes as part of their content generation and dissemination strategy to increase the effectiveness and engagement of the content. In this paper, we present a study of images from the PhoMemes 2022 Cha… ▽ More State-sponsored online influence operations typically consist of coordinated accounts exploiting the online space to influence public opinion. Accounts associated with these operations use images and memes as part of their content generation and dissemination strategy to increase the effectiveness and engagement of the content. In this paper, we present a study of images from the PhoMemes 2022 Challenge originating from the countries China, Iran, Russia, and Venezuela. First, we analyze the coordination of images within and across each country by quantifying image similarity. Then, we construct Image-Image networks and image clusters to identify key themes in the image influence operations. We derive the corresponding Account-Account networks to visualize the interaction between participating accounts within each country. Finally, we interpret the image content and network structure in the broader context of the organization and structure of influence operations in each country. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: 8 pages, 1 table, 4 figures, to be published in ICWSM-2022 workshop proceedings

arXiv:2112.07998 [pdf, other]

doi 10.1080/09546553.2021.2003785

Multi-modal Networks Reveal Patterns of Operational Similarity of Terrorist Organizations

Authors: Gian Maria Campedelli, Iain J. Cruickshank, Kathleen M. Carley

Abstract: Capturing dynamics of operational similarity among terrorist groups is critical to provide actionable insights for counter-terrorism and intelligence monitoring. Yet, in spite of its theoretical and practical relevance, research addressing this problem is currently lacking. We tackle this problem proposing a novel computational framework for detecting clusters of terrorist groups sharing similar b… ▽ More Capturing dynamics of operational similarity among terrorist groups is critical to provide actionable insights for counter-terrorism and intelligence monitoring. Yet, in spite of its theoretical and practical relevance, research addressing this problem is currently lacking. We tackle this problem proposing a novel computational framework for detecting clusters of terrorist groups sharing similar behaviors, focusing on groups' yearly repertoire of deployed tactics, attacked targets, and utilized weapons. Specifically considering those organizations that have plotted at least 50 attacks from 1997 to 2018, accounting for a total of 105 groups responsible for more than 42,000 events worldwide, we offer three sets of results. First, we show that over the years global terrorism has been characterized by increasing operational cohesiveness. Second, we highlight that year-to-year stability in co-clustering among groups has been particularly high from 2009 to 2018, indicating temporal consistency of similarity patterns in the last decade. Third, we demonstrate that operational similarity between two organizations is driven by three factors: (a) their overall activity; (b) the difference in the diversity of their operational repertoires; (c) the difference in a combined measure of diversity and activity. Groups' operational preferences, geographical homophily and ideological affinity have no consistent role in determining operational similarity. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: 42 pages, 19 figures

Journal ref: Terrorism and Political Violence, 0(0), 1-20 (2021)

arXiv:2111.06515 [pdf, other]

RATE: Overcoming Noise and Sparsity of Textual Features in Real-Time Location Estimation

Authors: Yu Zhang, Wei Wei, Binxuan Huang, Kathleen M. Carley, Yan Zhang

Abstract: Real-time location inference of social media users is the fundamental of some spatial applications such as localized search and event detection. While tweet text is the most commonly used feature in location estimation, most of the prior works suffer from either the noise or the sparsity of textual features. In this paper, we aim to tackle these two problems. We use topic modeling as a building bl… ▽ More Real-time location inference of social media users is the fundamental of some spatial applications such as localized search and event detection. While tweet text is the most commonly used feature in location estimation, most of the prior works suffer from either the noise or the sparsity of textual features. In this paper, we aim to tackle these two problems. We use topic modeling as a building block to characterize the geographic topic variation and lexical variation so that "one-hot" encoding vectors will no longer be directly used. We also incorporate other features which can be extracted through the Twitter streaming API to overcome the noise problem. Experimental results show that our RATE algorithm outperforms several benchmark methods, both in the precision of region classification and the mean distance error of latitude and longitude regression. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Comments: 4 pages; Accepted to CIKM 2017; Some typos fixed

arXiv:2110.04899 [pdf, ps, other]

Influencing the Influencers: Evaluating Person-to-Person Influence on Social Networks Using Granger Causality

Authors: Richard Kuzma, Iain J. Cruickshank, Kathleen M. Carley

Abstract: We introduce a novel method for analyzing person-to-person content influence on Twitter. Using an Ego-Alter framework and Granger Causality, we examine President Donald Trump (the Ego) and the people he retweets (Alters) as a case study. We find that each Alter has a different scope of influence across multiple topics, different magnitude of influence on a given topic, and the magnitude of a singl… ▽ More We introduce a novel method for analyzing person-to-person content influence on Twitter. Using an Ego-Alter framework and Granger Causality, we examine President Donald Trump (the Ego) and the people he retweets (Alters) as a case study. We find that each Alter has a different scope of influence across multiple topics, different magnitude of influence on a given topic, and the magnitude of a single Alter's influence can vary across topics. This work is novel in its focus on person-to-person influence and content-based influence. Its impact is two-fold: (1) identifying "canaries in the coal mine" who could be observed by misinformation researchers or platforms to identify misinformation narratives before super-influencers spread them to large audiences, and (2) enabling digital marketing targeted toward upstream Alters of super-influencers. △ Less

Submitted 10 October, 2021; originally announced October 2021.

arXiv:2110.04398 [pdf, ps, other]

doi 10.1103/PhysRevE.108.014306

The Role of Masks in Mitigating Viral Spread on Networks

Authors: Yurun Tian, Anirudh Sridhar, Chai Wah Wu, Simon A. Levin, Kathleen M. Carley, H. Vincent Poor, Osman Yagan

Abstract: Masks have remained an important mitigation strategy in the fight against COVID-19 due to their ability to prevent the transmission of respiratory droplets between individuals. In this work, we provide a comprehensive quantitative analysis of the impact of mask-wearing. To this end, we propose a novel agent-based model of viral spread on networks where agents may either wear no mask or wear one of… ▽ More Masks have remained an important mitigation strategy in the fight against COVID-19 due to their ability to prevent the transmission of respiratory droplets between individuals. In this work, we provide a comprehensive quantitative analysis of the impact of mask-wearing. To this end, we propose a novel agent-based model of viral spread on networks where agents may either wear no mask or wear one of several types of masks with different properties (e.g., cloth or surgical). We derive analytical expressions for three key epidemiological quantities: the probability of emergence, the epidemic threshold, and the expected epidemic size. In particular, we show how the aforementioned quantities depend on the structure of the contact network, viral transmission dynamics, and the distribution of the different types of masks within the population. Through extensive simulations, we then investigate the impact of different allocations of masks within the population and trade-offs between the outward efficiency and inward efficiency of the masks. Interestingly, we find that masks with high outward efficiency and low inward efficiency are most useful for controlling the spread in the early stages of an epidemic, while masks with high inward efficiency but low outward efficiency are most useful in reducing the size of an already large spread. Lastly, we study whether degree-based mask allocation is more effective in reducing the probability of epidemic as well as epidemic size compared to random allocation. The result echoes the previous findings that mitigation strategies should differ based on the stage of the spreading process, focusing on source control before the epidemic emerges and on self-protection after the emergence. △ Less

Submitted 6 June, 2023; v1 submitted 8 October, 2021; originally announced October 2021.

Comments: Accepted at Physical Review E

arXiv:2109.00945 [pdf, other]

doi 10.1007/s10588-022-09371-2

Coordinating Narratives and the Capitol Riots on Parler

Authors: Lynnette Hui Xian Ng, Iain Cruickshank, Kathleen M. Carley

Abstract: Coordinated disinformation campaigns are used to influence social media users, potentially leading to offline violence. In this study, we introduce a general methodology to uncover coordinated messaging through analysis of user parleys on Parler. The proposed method constructs a user-to-user coordination network graph induced by a user-to-text graph and a text-to-text similarity graph. The text-to… ▽ More Coordinated disinformation campaigns are used to influence social media users, potentially leading to offline violence. In this study, we introduce a general methodology to uncover coordinated messaging through analysis of user parleys on Parler. The proposed method constructs a user-to-user coordination network graph induced by a user-to-text graph and a text-to-text similarity graph. The text-to-text graph is constructed based on the textual similarity of Parler posts. We study three influential groups of users in the 6 January 2020 Capitol riots and detect networks of coordinated user clusters that are all posting similar textual content in support of different disinformation narratives related to the U.S. 2020 elections. △ Less

Submitted 2 September, 2021; originally announced September 2021.

Journal ref: Computational Mathematics Organizational Theory (2022)

arXiv:2107.09183

Analysis of External Content in the Vaccination Discussion on Twitter

Authors: Richard Kuzma, Iain J. Cruickshank, Kathleen M. Carley

Abstract: The spread of coronavirus and anti-vaccine conspiracies online hindered public health responses to the pandemic. We examined the content of external articles shared on Twitter from February to June 2020 to understand how conspiracy theories and fake news competed with legitimate sources of information. Examining external content--articles, rather than social media posts--is a novel methodology tha… ▽ More The spread of coronavirus and anti-vaccine conspiracies online hindered public health responses to the pandemic. We examined the content of external articles shared on Twitter from February to June 2020 to understand how conspiracy theories and fake news competed with legitimate sources of information. Examining external content--articles, rather than social media posts--is a novel methodology that allows for non-social media specific analysis of misinformation, tracking of changing narratives over time, and determining which types of resources (government, news, scientific, or dubious) dominate the pandemic vaccine conversation. We find that distinct narratives emerge, those narratives change over time, and lack of government and scientific messaging on coronavirus created an information vacuum filled by both traditional news and conspiracy theories. △ Less

Submitted 3 September, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

Comments: Data Ownership Issues

arXiv:2107.03318 [pdf, other]

Climate Change Conspiracy Theories on Social Media

Authors: Aman Tyagi, Kathleen M. Carley

Abstract: One of the critical emerging challenges in climate change communication is the prevalence of conspiracy theories. This paper discusses some of the major conspiracy theories related to climate change found in a large Twitter corpus. We use a state-of-the-art stance detection method to find whether conspiracy theories are more popular among Disbelievers or Believers of climate change. We then analyz… ▽ More One of the critical emerging challenges in climate change communication is the prevalence of conspiracy theories. This paper discusses some of the major conspiracy theories related to climate change found in a large Twitter corpus. We use a state-of-the-art stance detection method to find whether conspiracy theories are more popular among Disbelievers or Believers of climate change. We then analyze which conspiracy theory is more popular than the others and how popularity changes with climate change belief. We find that Disbelievers of climate change are overwhelmingly responsible for sharing messages with conspiracy theory-related keywords, and not all conspiracy theories are equally shared. Lastly, we discuss the implications of our findings for climate change communication. △ Less

Submitted 7 July, 2021; originally announced July 2021.

arXiv:2105.07454 [pdf, other]

A Synchronized Action Framework for Responsible Detection of Coordination on Social Media

Authors: Thomas Magelinski, Lynnette Hui Xian Ng, Kathleen M. Carley

Abstract: The study of coordinated manipulation of conversations on social media has become more prevalent as social media's role in amplifying misinformation, hate, and polarization has come under scrutiny. We discuss the implications of successful coordination detection algorithms based on shifts of power, and consider how responsible coordination detection may be carried out through synchronized action.… ▽ More The study of coordinated manipulation of conversations on social media has become more prevalent as social media's role in amplifying misinformation, hate, and polarization has come under scrutiny. We discuss the implications of successful coordination detection algorithms based on shifts of power, and consider how responsible coordination detection may be carried out through synchronized action. We then propose a Synchronized Action Framework for detection of automated coordination through construction and analysis of multi-view networks. We validate our framework by examining the Reopen America conversation on Twitter, discovering three coordinated campaigns. We further investigate covert coordination surrounding the protests and find the task to be far more complex than examples seen in prior work, demonstrating the need for our multi-view approach. A cluster of suspicious users is identified and the activity of three members is detailed. These users amplify protest messages using the same hashtags at very similar times, though they all focus on different states. Through this analysis, we emphasize both the potential usefulness of coordination detection algorithms in investigating amplification, and the need for careful and responsible deployment of such tools. △ Less

Submitted 5 June, 2021; v1 submitted 16 May, 2021; originally announced May 2021.

arXiv:2104.10398 [pdf, other]

doi 10.1038/s41598-021-87709-7

Learning future terrorist targets through temporal meta-graphs

Authors: Gian Maria Campedelli, Mihovil Bartulovic, Kathleen M. Carley

Abstract: In the last 20 years, terrorism has led to hundreds of thousands of deaths and massive economic, political, and humanitarian crises in several regions of the world. Using real-world data on attacks occurred in Afghanistan and Iraq from 2001 to 2018, we propose the use of temporal meta-graphs and deep learning to forecast future terrorist targets. Focusing on three event dimensions, i.e., employed… ▽ More In the last 20 years, terrorism has led to hundreds of thousands of deaths and massive economic, political, and humanitarian crises in several regions of the world. Using real-world data on attacks occurred in Afghanistan and Iraq from 2001 to 2018, we propose the use of temporal meta-graphs and deep learning to forecast future terrorist targets. Focusing on three event dimensions, i.e., employed weapons, deployed tactics and chosen targets, meta-graphs map the connections among temporally close attacks, capturing their operational similarities and dependencies. From these temporal meta-graphs, we derive 2-day-based time series that measure the centrality of each feature within each dimension over time. Formulating the problem in the context of the strategic behavior of terrorist actors, these multivariate temporal sequences are then utilized to learn what target types are at the highest risk of being chosen. The paper makes two contributions. First, it demonstrates that engineering the feature space via temporal meta-graphs produces richer knowledge than shallow time-series that only rely on frequency of feature occurrences. Second, the performed experiments reveal that bi-directional LSTM networks achieve superior forecasting performance compared to other algorithms, calling for future research aiming at fully discovering the potential of artificial intelligence to counter terrorist violence. △ Less

Submitted 21 April, 2021; originally announced April 2021.

Comments: 19 pages, 18 figures

Journal ref: Sci Rep 11, 8533 (2021)

arXiv:2104.05611 [pdf, other]

Exploring Polarization of Users Behavior on Twitter During the 2019 South American Protests

Authors: Ramon Villa-Cox, Helen, Zeng, Ashiqur R. KhudaBukhsh, Kathleen M. Carley

Abstract: Research across different disciplines has documented the expanding polarization in social media. However, much of it focused on the US political system or its culturally controversial topics. In this work, we explore polarization on Twitter in a different context, namely the protest that paralyzed several countries in the South American region in 2019. By leveraging users' endorsement of politicia… ▽ More Research across different disciplines has documented the expanding polarization in social media. However, much of it focused on the US political system or its culturally controversial topics. In this work, we explore polarization on Twitter in a different context, namely the protest that paralyzed several countries in the South American region in 2019. By leveraging users' endorsement of politicians' tweets and hashtag campaigns with defined stances towards the protest (for or against), we construct a weakly labeled stance dataset with millions of users. We explore polarization in two related dimensions: language and news consumption patterns. In terms of linguistic polarization, we apply recent insights that leveraged machine translation methods, showing that the two communities speak consistently "different" languages, mainly along ideological lines (e.g., fascist translates to communist). Our results indicate that this recently-proposed methodology is also informative in different languages and contexts than originally applied. In terms of news consumption patterns, we cluster news agencies based on homogeneity of their user bases and quantify the observed polarization in its consumption. We find empirical evidence of the "filter bubble" phenomenon during the event, as we not only show that the user bases are homogeneous in terms of stance, but the probability that a user transitions from media of different clusters is low. △ Less

Submitted 5 April, 2021; originally announced April 2021.

arXiv:2104.01215 [pdf, other]

The Coronavirus is a Bioweapon: Analysing Coronavirus Fact-Checked Stories

Authors: Lynnette Hui Xian Ng, Kathleen M. Carley

Abstract: The 2020 coronavirus pandemic has heightened the need to flag coronavirus-related misinformation, and fact-checking groups have taken to verifying misinformation on the Internet. We explore stories reported by fact-checking groups PolitiFact, Poynter and Snopes from January to June 2020, characterising them into six story clusters before then analyse time-series and story validity trends and the l… ▽ More The 2020 coronavirus pandemic has heightened the need to flag coronavirus-related misinformation, and fact-checking groups have taken to verifying misinformation on the Internet. We explore stories reported by fact-checking groups PolitiFact, Poynter and Snopes from January to June 2020, characterising them into six story clusters before then analyse time-series and story validity trends and the level of agreement across sites. We further break down the story clusters into more granular story types by proposing a unique automated method with a BERT classifier, which can be used to classify diverse story sources, in both fact-checked stories and tweets. △ Less

Submitted 2 April, 2021; originally announced April 2021.

Journal ref: SBP-Brims 2020 COVID Special Track

arXiv:2103.07098 [pdf, other]

A Weakly Supervised Approach for Classifying Stance in Twitter Replies

Authors: Sumeet Kumar, Ramon Villa Cox, Matthew Babcock, Kathleen M. Carley

Abstract: Conversations on social media (SM) are increasingly being used to investigate social issues on the web, such as online harassment and rumor spread. For such issues, a common thread of research uses adversarial reactions, e.g., replies pointing out factual inaccuracies in rumors. Though adversarial reactions are prevalent in online conversations, inferring those adverse views (or stance) from the t… ▽ More Conversations on social media (SM) are increasingly being used to investigate social issues on the web, such as online harassment and rumor spread. For such issues, a common thread of research uses adversarial reactions, e.g., replies pointing out factual inaccuracies in rumors. Though adversarial reactions are prevalent in online conversations, inferring those adverse views (or stance) from the text in replies is difficult and requires complex natural language processing (NLP) models. Moreover, conventional NLP models for stance mining need labeled data for supervised learning. Getting labeled conversations can itself be challenging as conversations can be on any topic, and topics change over time. These challenges make learning the stance a difficult NLP problem. In this research, we first create a new stance dataset comprised of three different topics by labeling both users' opinions on the topics (as in pro/con) and users' stance while replying to others' posts (as in favor/oppose). As we find limitations with supervised approaches, we propose a weakly-supervised approach to predict the stance in Twitter replies. Our novel method allows using a smaller number of hashtags to generate weak labels for Twitter replies. Compared to supervised learning, our method improves the mean F1-macro by 8\% on the hand-labeled dataset without using any hand-labeled examples in the training set. We further show the applicability of our proposed method on COVID 19 related conversations on Twitter. △ Less

Submitted 12 March, 2021; originally announced March 2021.

arXiv:2008.13054 [pdf, other]

Polarizing Tweets on Climate Change

Authors: Aman Tyagi, Matthew Babcock, Kathleen M. Carley, Douglas C. Sicker

Abstract: We introduce a framework to analyze the conversation between two competing groups of Twitter users, one who believe in the anthropogenic causes of climate change (Believers) and a second who are skeptical (Disbelievers). As a case study, we use Climate Change related tweets during the United Nation's (UN) Climate Change Conference - COP24 (2018), Katowice, Poland. We find that both Disbelievers an… ▽ More We introduce a framework to analyze the conversation between two competing groups of Twitter users, one who believe in the anthropogenic causes of climate change (Believers) and a second who are skeptical (Disbelievers). As a case study, we use Climate Change related tweets during the United Nation's (UN) Climate Change Conference - COP24 (2018), Katowice, Poland. We find that both Disbelievers and Believers talk within their group more than with the other group; this is more so the case for Disbelievers than for Believers. The Disbeliever messages focused more on attacking those personalities that believe in the anthropogenic causes of climate change. On the other hand, Believer messages focused on calls to combat climate change. We find that in both Disbelievers and Believers bot-like accounts were equally active and that unlike Believers, Disbelievers get their news from a concentrated number of news sources. △ Less

Submitted 29 August, 2020; originally announced August 2020.

Journal ref: Proceedings of the International Conference SBP-BRiMS 2020, Halil Bisgin, Ayaz Hyder, Chris Dancy, and Robert Thomson (Eds.) Washington DC, October 2020, Springer

arXiv:2008.13051 [pdf, other]

Affective Polarization in Online Climate Change Discourse on Twitter

Authors: Aman Tyagi, Joshua Uyheng, Kathleen M. Carley

Abstract: Online social media has become an important platform to organize around different socio-cultural and political topics. An extensive scholarship has discussed how people are divided into echo-chamber-like groups. However, there is a lack of work related to quantifying hostile communication or \textit{affective polarization} between two competing groups. This paper proposes a systematic, network-bas… ▽ More Online social media has become an important platform to organize around different socio-cultural and political topics. An extensive scholarship has discussed how people are divided into echo-chamber-like groups. However, there is a lack of work related to quantifying hostile communication or \textit{affective polarization} between two competing groups. This paper proposes a systematic, network-based methodology for examining affective polarization in online conversations. Further, we apply our framework to 100 weeks of Twitter discourse about climate change. We find that deniers of climate change (Disbelievers) are more hostile towards people who believe (Believers) in the anthropogenic cause of climate change than vice versa. Moreover, Disbelievers use more words and hashtags related to natural disasters during more hostile weeks as compared to Believers. These findings bear implications for studying affective polarization in online discourse, especially concerning the subject of climate change. Lastly, we discuss our findings in the context of increasingly important climate change communication research. △ Less

Submitted 29 August, 2020; originally announced August 2020.

arXiv:2008.10102 [pdf, other]

Social Cybersecurity Chapter 13: Casestudy with COVID-19 Pandemic

Authors: David M. Beskow, Kathleen M. Carley

Abstract: The purpose of this case study is to leverage the concepts and tools presented in the preceding chapters and apply them in a real world social cybersecurity context. With the COVID-19 pandemic emerging as a defining event of the 21st Century and a magnet for disinformation maneuver, we have selected the pandemic and its related social media conversation to focus our efforts on. This chapter theref… ▽ More The purpose of this case study is to leverage the concepts and tools presented in the preceding chapters and apply them in a real world social cybersecurity context. With the COVID-19 pandemic emerging as a defining event of the 21st Century and a magnet for disinformation maneuver, we have selected the pandemic and its related social media conversation to focus our efforts on. This chapter therefore applies the tools of information operation maneuver, bot detection and characterization, meme detection and characterization, and information mapping to the COVID-19 related conversation on Twitter. This chapter uses these tools to analyze a stream containing 206 million tweets from 27 million unique users from 15 March 2020 to 30 April 2020. Our results shed light on elaborate information operations that leverage the full breadth of the BEND maneuvers and use bots for important shaping operations. △ Less

Submitted 23 August, 2020; originally announced August 2020.

Showing 1–50 of 73 results for author: Carley, K M