-
Quantifying Influencer Impact on Affective Polarization
Authors:
Rezaur Rashid,
Joshua Melton,
Ouldouz Ghorbani,
Siddharth Krishnan,
Shannon Reid,
Gabriel Terejanu
Abstract:
In today's digital age, social media platforms play a crucial role in shaping public opinion. This study explores how discussions led by influencers on Twitter, now known as 'X', affect public sentiment and contribute to online polarization. We developed a counterfactual framework to analyze the polarization scores of conversations in scenarios both with and without the presence of an influential…
▽ More
In today's digital age, social media platforms play a crucial role in shaping public opinion. This study explores how discussions led by influencers on Twitter, now known as 'X', affect public sentiment and contribute to online polarization. We developed a counterfactual framework to analyze the polarization scores of conversations in scenarios both with and without the presence of an influential figure. Two case studies, centered on the polarizing issues of climate change and gun control, were examined. Our research highlights the significant impact these figures have on public discourse, providing valuable insights into how online discussions can influence societal divisions.
△ Less
Submitted 16 September, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Two-Stage Stance Labeling: User-Hashtag Heuristics with Graph Neural Networks
Authors:
Joshua Melton,
Shannon Reid,
Gabriel Terejanu,
Siddharth Krishnan
Abstract:
The high volume and rapid evolution of content on social media present major challenges for studying the stance of social media users. In this work, we develop a two stage stance labeling method that utilizes the user-hashtag bipartite graph and the user-user interaction graph. In the first stage, a simple and efficient heuristic for stance labeling uses the user-hashtag bipartite graph to iterati…
▽ More
The high volume and rapid evolution of content on social media present major challenges for studying the stance of social media users. In this work, we develop a two stage stance labeling method that utilizes the user-hashtag bipartite graph and the user-user interaction graph. In the first stage, a simple and efficient heuristic for stance labeling uses the user-hashtag bipartite graph to iteratively update the stance association of user and hashtag nodes via a label propagation mechanism. This set of soft labels is then integrated with the user-user interaction graph to train a graph neural network (GNN) model using semi-supervised learning. We evaluate this method on two large-scale datasets containing tweets related to climate change from June 2021 to June 2022 and gun control from January 2022 to January 2023. Our experiments demonstrate that enriching text-based embeddings of users with network information from the user interaction graph using our semi-supervised GNN method outperforms both classifiers trained on user textual embeddings and zero-shot classification using LLMs such as GPT4. We discuss the need for integrating nuanced understanding from social science with the scalability of computational methods to better understand how polarization on social media occurs for divisive issues such as climate change and gun control.
△ Less
Submitted 17 May, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Flurry: a Fast Framework for Reproducible Multi-layered Provenance Graph Representation Learning
Authors:
Maya Kapoor,
Joshua Melton,
Michael Ridenhour,
Mahalavanya Sriram,
Thomas Moyer,
Siddharth Krishnan
Abstract:
Complex heterogeneous dynamic networks like knowledge graphs are powerful constructs that can be used in modeling data provenance from computer systems. From a security perspective, these attributed graphs enable causality analysis and tracing for analyzing a myriad of cyberattacks. However, there is a paucity in systematic development of pipelines that transform system executions and provenance i…
▽ More
Complex heterogeneous dynamic networks like knowledge graphs are powerful constructs that can be used in modeling data provenance from computer systems. From a security perspective, these attributed graphs enable causality analysis and tracing for analyzing a myriad of cyberattacks. However, there is a paucity in systematic development of pipelines that transform system executions and provenance into usable graph representations for machine learning tasks. This lack of instrumentation severely inhibits scientific advancement in provenance graph machine learning by hindering reproducibility and limiting the availability of data that are critical for techniques like graph neural networks. To fulfill this need, we present Flurry, an end-to-end data pipeline which simulates cyberattacks, captures provenance data from these attacks at multiple system and application layers, converts audit logs from these attacks into data provenance graphs, and incorporates this data with a framework for training deep neural models that supports preconfigured or custom-designed models for analysis in real-world resilient systems. We showcase this pipeline by processing data from multiple system attacks and performing anomaly detection via graph classification using current benchmark graph representational learning frameworks. Flurry provides a fast, customizable, extensible, and transparent solution for providing this much needed data to cybersecurity professionals.
△ Less
Submitted 5 March, 2022;
originally announced March 2022.
-
Pay Attention to Relations: Multi-embeddings for Attributed Multiplex Networks
Authors:
Joshua Melton,
Michael Ridenhour,
Siddharth Krishnan
Abstract:
Graph Convolutional Neural Networks (GCNs) have become effective machine learning algorithms for many downstream network mining tasks such as node classification, link prediction, and community detection. However, most GCN methods have been developed for homogenous networks and are limited to a single embedding for each node. Complex systems, often represented by heterogeneous, multiplex networks…
▽ More
Graph Convolutional Neural Networks (GCNs) have become effective machine learning algorithms for many downstream network mining tasks such as node classification, link prediction, and community detection. However, most GCN methods have been developed for homogenous networks and are limited to a single embedding for each node. Complex systems, often represented by heterogeneous, multiplex networks present a more difficult challenge for GCN models and require that such techniques capture the diverse contexts and assorted interactions that occur between nodes. In this work, we propose RAHMeN, a novel unified relation-aware embedding framework for attributed heterogeneous multiplex networks. Our model incorporates node attributes, motif-based features, relation-based GCN approaches, and relational self-attention to learn embeddings of nodes with respect to the various relations in a heterogeneous, multiplex network. In contrast to prior work, RAHMeN is a more expressive embedding framework that embraces the multi-faceted nature of nodes in such networks, producing a set of multi-embeddings that capture the varied and diverse contexts of nodes.
We evaluate our model on four real-world datasets from Amazon, Twitter, YouTube, and Tissue PPIs in both transductive and inductive settings. Our results show that RAHMeN consistently outperforms comparable state-of-the-art network embedding models, and an analysis of RAHMeN's relational self-attention demonstrates that our model discovers interpretable connections between relations present in heterogeneous, multiplex networks.
△ Less
Submitted 3 March, 2022;
originally announced March 2022.
-
DeL-haTE: A Deep Learning Tunable Ensemble for Hate Speech Detection
Authors:
Joshua Melton,
Arunkumar Bagavathi,
Siddharth Krishnan
Abstract:
Online hate speech on social media has become a fast-growing problem in recent times. Nefarious groups have developed large content delivery networks across several main-stream (Twitter and Facebook) and fringe (Gab, 4chan, 8chan, etc.) outlets to deliver cascades of hate messages directed both at individuals and communities. Thus addressing these issues has become a top priority for large-scale s…
▽ More
Online hate speech on social media has become a fast-growing problem in recent times. Nefarious groups have developed large content delivery networks across several main-stream (Twitter and Facebook) and fringe (Gab, 4chan, 8chan, etc.) outlets to deliver cascades of hate messages directed both at individuals and communities. Thus addressing these issues has become a top priority for large-scale social media outlets. Three key challenges in automated detection and classification of hateful content are the lack of clearly labeled data, evolving vocabulary and lexicon - hashtags, emojis, etc. - and the lack of baseline models for fringe outlets such as Gab. In this work, we propose a novel framework with three major contributions. (a) We engineer an ensemble of deep learning models that combines the strengths of state-of-the-art approaches, (b) we incorporate a tuning factor into this framework that leverages transfer learning to conduct automated hate speech classification on unlabeled datasets, like Gab, and (c) we develop a weak supervised learning methodology that allows our framework to train on unlabeled data. Our ensemble models achieve an 83% hate recall on the HON dataset, surpassing the performance of the state-of-the-art deep models. We demonstrate that weak supervised training in combination with classifier tuning significantly increases model performance on unlabeled data from Gab, achieving a hate recall of 67%.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
The Fundamentals of Policy Crowdsourcing
Authors:
John Prpic,
Araz Taeihagh,
James Melton
Abstract:
What is the state of the research on crowdsourcing for policy making? This article begins to answer this question by collecting, categorizing, and situating an extensive body of the extant research investigating policy crowdsourcing, within a new framework built on fundamental typologies from each field. We first define seven universal characteristics of the three general crowdsourcing techniques…
▽ More
What is the state of the research on crowdsourcing for policy making? This article begins to answer this question by collecting, categorizing, and situating an extensive body of the extant research investigating policy crowdsourcing, within a new framework built on fundamental typologies from each field. We first define seven universal characteristics of the three general crowdsourcing techniques (virtual labor markets, tournament crowdsourcing, open collaboration), to examine the relative trade-offs of each modality. We then compare these three types of crowdsourcing to the different stages of the policy cycle, in order to situate the literature spanning both domains. We finally discuss research trends in crowdsourcing for public policy, and highlight the research gaps and overlaps in the literature.
KEYWORDS: crowdsourcing, policy cycle, crowdsourcing trade-offs, policy processes, policy stages, virtual labor markets, tournament crowdsourcing, open collaboration
△ Less
Submitted 8 February, 2018;
originally announced February 2018.
-
MOOCs and Crowdsourcing: Massive Courses and Massive Resources
Authors:
John Prpic,
James Melton,
Araz Taeihagh,
Terry Anderson
Abstract:
Premised upon the observation that MOOC and crowdsourcing phenomena share several important characteristics, including IT mediation, large-scale human participation, and varying levels of openness to participants, this work systematizes a comparison of MOOC and crowdsourcing phenomena along these salient dimensions. In doing so, we learn that both domains share further common traits, including sim…
▽ More
Premised upon the observation that MOOC and crowdsourcing phenomena share several important characteristics, including IT mediation, large-scale human participation, and varying levels of openness to participants, this work systematizes a comparison of MOOC and crowdsourcing phenomena along these salient dimensions. In doing so, we learn that both domains share further common traits, including similarities in IT structures, knowledge generating capabilities, presence of intermediary service providers, and techniques designed to attract and maintain participant activity. Stemming directly from this analysis, we discuss new directions for future research in both fields and draw out actionable implications for practitioners and researchers in both domains.
△ Less
Submitted 10 February, 2017;
originally announced February 2017.
-
Experiments on Crowdsourcing Policy Assessment
Authors:
J. Prpic,
A. Taeihagh,
J. Melton
Abstract:
Can Crowds serve as useful allies in policy design? How do non-expert Crowds perform relative to experts in the assessment of policy measures? Does the geographic location of non-expert Crowds, with relevance to the policy context, alter the performance of non-experts Crowds in the assessment of policy measures? In this work, we investigate these questions by undertaking experiments designed to re…
▽ More
Can Crowds serve as useful allies in policy design? How do non-expert Crowds perform relative to experts in the assessment of policy measures? Does the geographic location of non-expert Crowds, with relevance to the policy context, alter the performance of non-experts Crowds in the assessment of policy measures? In this work, we investigate these questions by undertaking experiments designed to replicate expert policy assessments with non-expert Crowds recruited from Virtual Labor Markets. We use a set of ninety-six climate change adaptation policy measures previously evaluated by experts in the Netherlands as our control condition to conduct experiments using two discrete sets of non-expert Crowds recruited from Virtual Labor Markets. We vary the composition of our non-expert Crowds along two conditions: participants recruited from a geographical location directly relevant to the policy context and participants recruited at-large. We discuss our research methods in detail and provide the findings of our experiments.
△ Less
Submitted 10 February, 2017;
originally announced February 2017.
-
Crowdsourcing the Policy Cycle
Authors:
J. Prpic,
A. Taeihagh,
J. Melton
Abstract:
Crowdsourcing is beginning to be used for policymaking. The wisdom of crowds [Surowiecki 2005], and crowdsourcing [Brabham 2008], are seen as new avenues that can shape all kinds of policy, from transportation policy [Nash 2009] to urban planning [Seltzer and Mahmoudi 2013], to climate policy. In general, many have high expectations for positive outcomes with crowdsourcing, and based on both anecd…
▽ More
Crowdsourcing is beginning to be used for policymaking. The wisdom of crowds [Surowiecki 2005], and crowdsourcing [Brabham 2008], are seen as new avenues that can shape all kinds of policy, from transportation policy [Nash 2009] to urban planning [Seltzer and Mahmoudi 2013], to climate policy. In general, many have high expectations for positive outcomes with crowdsourcing, and based on both anecdotal and empirical evidence, some of these expectations seem justified [Majchrzak and Malhotra 2013]. Yet, to our knowledge, research has yet to emerge that unpacks the different forms of crowdsourcing in light of each stage of the well-established policy cycle. This work addresses this research gap, and in doing so brings increased nuance to the application of crowdsourcing techniques for policymaking.
△ Less
Submitted 10 February, 2017;
originally announced February 2017.
-
A Framework for Policy Crowdsourcing
Authors:
J. Prpic,
A. Taeihagh,
J. Melton
Abstract:
What is the state of the literature in respect to Crowdsourcing for policy making? This work attempts to answer this question by collecting, categorizing, and situating the extant research investigating Crowdsourcing for policy, within the broader Crowdsourcing literature. To do so, the work first extends the Crowdsourcing literature by introducing, defining, explaining, and using seven universal…
▽ More
What is the state of the literature in respect to Crowdsourcing for policy making? This work attempts to answer this question by collecting, categorizing, and situating the extant research investigating Crowdsourcing for policy, within the broader Crowdsourcing literature. To do so, the work first extends the Crowdsourcing literature by introducing, defining, explaining, and using seven universal characteristics of all general Crowdsourcing techniques, to vividly draw-out the relative trade-offs of each mode of Crowdsourcing. From this beginning, the work systematically and explicitly weds the three types of Crowdsourcing to the stages of the Policy cycle as a method of situating the extant literature spanning both domains. Thereafter, we discuss the trends, highlighting the research gaps, and outline the overlaps in the research on Crowdsourcing for policy, stemming from our analysis.
△ Less
Submitted 10 February, 2017;
originally announced February 2017.
-
A Critique of ANSI SQL Isolation Levels
Authors:
Hal Berenson,
Phil Bernstein,
Jim Gray,
Jim Melton,
Elizabeth O'Neil,
Patrick O'Neil
Abstract:
ANSI SQL-92 defines Isolation Levels in terms of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantoms. This paper shows that these phenomena and the ANSI SQL definitions fail to characterize several popular isolation levels, including the standard locking implementations of the levels. Investigating the ambiguities of the phenomena leads to clearer definitions; in addition new phenomena th…
▽ More
ANSI SQL-92 defines Isolation Levels in terms of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantoms. This paper shows that these phenomena and the ANSI SQL definitions fail to characterize several popular isolation levels, including the standard locking implementations of the levels. Investigating the ambiguities of the phenomena leads to clearer definitions; in addition new phenomena that better characterize isolation types are introduced. An important multiversion isolation type, Snapshot Isolation, is defined.
△ Less
Submitted 25 January, 2007;
originally announced January 2007.