-
A Computational Analysis and Visualization of In-Text Reference Networks Across Philosophical Texts
Authors:
Robert Becker,
Aron Culotta
Abstract:
We applied computational methods to analyze references across 2,245 philosophical texts, spanning from approximately 550 BCE to 1940 AD, in order to measure patterns in how philosophical ideas have spread over time. Using natural language processing and network analysis, we mapped over 294,970 references between authors, classifying each reference into subdisciplines of philosophy based on its sur…
▽ More
We applied computational methods to analyze references across 2,245 philosophical texts, spanning from approximately 550 BCE to 1940 AD, in order to measure patterns in how philosophical ideas have spread over time. Using natural language processing and network analysis, we mapped over 294,970 references between authors, classifying each reference into subdisciplines of philosophy based on its surrounding context. We then constructed a graph, with authors as nodes and textual references as edges, to empirically validate, visualize, and quantify intellectual lineages as they are understood within philosophical scholarship. For instance, we find that Plato and Aristotle alone account for nearly 10% of all references from authors in our dataset, suggesting that their influence may still be underestimated. As another example, we support the view that St. Thomas Aquinas served as a synthesizer between Aristotelian and Christian philosophy by analyzing the network structures of Aquinas, Aristotle, and Christian theologians. Our results are presented through an interactive visualization tool, allowing users to dynamically explore these networks, alongside a mathematical analysis of the network's structure. Our methodology demonstrates the value of applying network analysis with textual references to study a large collection of historical works.
△ Less
Submitted 6 May, 2025; v1 submitted 22 April, 2025;
originally announced April 2025.
-
Leaders or Followers? A Temporal Analysis of Tweets from IRA Trolls
Authors:
Siva K. Balasubramanian,
Mustafa Bilgic,
Aron Culotta,
Libby Hemphill,
Anita Nikolich,
Matthew A. Shapiro
Abstract:
The Internet Research Agency (IRA) influences online political conversations in the United States, exacerbating existing partisan divides and sowing discord. In this paper we investigate the IRA's communication strategies by analyzing trending terms on Twitter to identify cases in which the IRA leads or follows other users. Our analysis focuses on over 38M tweets posted between 2016 and 2017 from…
▽ More
The Internet Research Agency (IRA) influences online political conversations in the United States, exacerbating existing partisan divides and sowing discord. In this paper we investigate the IRA's communication strategies by analyzing trending terms on Twitter to identify cases in which the IRA leads or follows other users. Our analysis focuses on over 38M tweets posted between 2016 and 2017 from IRA users (n=3,613), journalists (n=976), members of Congress (n=526), and politically engaged users from the general public (n=71,128). We find that the IRA tends to lead on topics related to the 2016 election, race, and entertainment, suggesting that these are areas both of strategic importance as well having the highest potential impact. Furthermore, we identify topics where the IRA has been relatively ineffective, such as tweets on military, political scandals, and violent attacks. Despite many tweets on these topics, the IRA rarely leads the conversation and thus has little opportunity to influence it. We offer our proposed methodology as a way to track the strategic choices of future influence operations in real-time.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Enhancing Model Robustness and Fairness with Causality: A Regularization Approach
Authors:
Zhao Wang,
Kai Shu,
Aron Culotta
Abstract:
Recent work has raised concerns on the risk of spurious correlations and unintended biases in statistical machine learning models that threaten model robustness and fairness. In this paper, we propose a simple and intuitive regularization approach to integrate causal knowledge during model training and build a robust and fair model by emphasizing causal features and de-emphasizing spurious feature…
▽ More
Recent work has raised concerns on the risk of spurious correlations and unintended biases in statistical machine learning models that threaten model robustness and fairness. In this paper, we propose a simple and intuitive regularization approach to integrate causal knowledge during model training and build a robust and fair model by emphasizing causal features and de-emphasizing spurious features. Specifically, we first manually identify causal and spurious features with principles inspired from the counterfactual framework of causal inference. Then, we propose a regularization approach to penalize causal and spurious features separately. By adjusting the strength of the penalty for each type of feature, we build a predictive model that relies more on causal features and less on non-causal features. We conduct experiments to evaluate model robustness and fairness on three datasets with multiple metrics. Empirical results show that the new models built with causal awareness significantly improve model robustness with respect to counterfactual texts and model fairness with respect to sensitive attributes.
△ Less
Submitted 2 October, 2021;
originally announced October 2021.
-
Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals
Authors:
Zhao Wang,
Aron Culotta
Abstract:
Spurious correlations threaten the validity of statistical classifiers. While model accuracy may appear high when the test data is from the same distribution as the training data, it can quickly degrade when the test distribution changes. For example, it has been shown that classifiers perform poorly when humans make minor modifications to change the label of an example. One solution to increase m…
▽ More
Spurious correlations threaten the validity of statistical classifiers. While model accuracy may appear high when the test data is from the same distribution as the training data, it can quickly degrade when the test distribution changes. For example, it has been shown that classifiers perform poorly when humans make minor modifications to change the label of an example. One solution to increase model reliability and generalizability is to identify causal associations between features and classes. In this paper, we propose to train a robust text classifier by augmenting the training data with automatically generated counterfactual data. We first identify likely causal features using a statistical matching approach. Next, we generate counterfactual samples for the original training data by substituting causal features with their antonyms and then assigning opposite labels to the counterfactual samples. Finally, we combine the original data and counterfactual data to train a robust classifier. Experiments on two classification tasks show that a traditional classifier trained on the original data does very poorly on human-generated counterfactual samples (e.g., 10%-37% drop in accuracy). However, the classifier trained on the combined data is more robust and performs well on both the original test data and the counterfactual test data (e.g., 12%-25% increase in accuracy compared with the traditional classifier). Detailed analysis shows that the robust classifier makes meaningful and trustworthy predictions by emphasizing causal features and de-emphasizing non-causal features.
△ Less
Submitted 17 December, 2020;
originally announced December 2020.
-
Are Words Commensurate with Actions? Quantifying Commitment to a Cause from Online Public Messaging
Authors:
Zhao Wang,
Jennifer Cutler,
Aron Culotta
Abstract:
Public entities such as companies and politicians increasingly use online social networks to communicate directly with their constituencies. Often, this public messaging is aimed at aligning the entity with a particular cause or issue, such as the environment or public health. However, as a consumer or voter, it can be difficult to assess an entity's true commitment to a cause based on public mess…
▽ More
Public entities such as companies and politicians increasingly use online social networks to communicate directly with their constituencies. Often, this public messaging is aimed at aligning the entity with a particular cause or issue, such as the environment or public health. However, as a consumer or voter, it can be difficult to assess an entity's true commitment to a cause based on public messaging. In this paper, we present a text classification approach to categorize a message according to its commitment level toward a cause. We then compare the volume of such messages with external ratings based on entities' actions (e.g., a politician's voting record with respect to the environment or a company's rating from environmental non-profits). We find that by distinguishing between low- and high- level commitment messages, we can more reliably identify truly committed entities. Furthermore, by measuring the discrepancy between classified messages and external ratings, we can identify entities whose public messaging does not align with their actions, thereby providing a methodology to identify potentially "inauthentic" messaging campaigns.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
Identifying Spurious Correlations for Robust Text Classification
Authors:
Zhao Wang,
Aron Culotta
Abstract:
The predictions of text classifiers are often driven by spurious correlations -- e.g., the term `Spielberg' correlates with positively reviewed movies, even though the term itself does not semantically convey a positive sentiment. In this paper, we propose a method to distinguish spurious and genuine correlations in text classification. We treat this as a supervised classification problem, using f…
▽ More
The predictions of text classifiers are often driven by spurious correlations -- e.g., the term `Spielberg' correlates with positively reviewed movies, even though the term itself does not semantically convey a positive sentiment. In this paper, we propose a method to distinguish spurious and genuine correlations in text classification. We treat this as a supervised classification problem, using features derived from treatment effect estimators to distinguish spurious correlations from "genuine" ones. Due to the generic nature of these features and their small dimensionality, we find that the approach works well even with limited training examples, and that it is possible to transport the word classifier to new domains. Experiments on four datasets (sentiment classification and toxicity detection) suggest that using this approach to inform feature selection also leads to more robust classification, as measured by improved worst-case accuracy on the samples affected by spurious correlations.
△ Less
Submitted 5 October, 2020;
originally announced October 2020.
-
Personality and Behavior in Role-based Online Games
Authors:
Zhao Wang,
Anna Sapienza,
Aron Culotta,
Emilio Ferrara
Abstract:
Both offline and online human behaviors are affected by personality. Of special interests are online games, where players have to impersonate specific roles and their behaviors are extensively tracked by the game. In this paper, we propose to study the relationship between players' personality and game behavior in League of Legends (LoL), one of the most popular Multiplayer Online Battle Arena (MO…
▽ More
Both offline and online human behaviors are affected by personality. Of special interests are online games, where players have to impersonate specific roles and their behaviors are extensively tracked by the game. In this paper, we propose to study the relationship between players' personality and game behavior in League of Legends (LoL), one of the most popular Multiplayer Online Battle Arena (MOBA) games. We use linear mixed effects (LME) models to describe relationships between players' personality traits (measured by the Five Factor Model) and two major aspects of the game: the impersonated roles and in-game actions. On the one hand, we study relationships within the game environment by modeling role attributes from match behaviors and vice versa. On the other hand, we analyze the relationship between a player's five personality traits and their game behavior by showing significant correlations between each personality trait and the set of corresponding behaviors. Our findings suggest that personality and behavior are highly entangled and provide a new perspective to understand how personality can affect behavior in role-based online games.
△ Less
Submitted 20 May, 2019;
originally announced May 2019.
-
When do Words Matter? Understanding the Impact of Lexical Choice on Audience Perception using Individual Treatment Effect Estimation
Authors:
Zhao Wang,
Aron Culotta
Abstract:
Studies across many disciplines have shown that lexical choice can affect audience perception. For example, how users describe themselves in a social media profile can affect their perceived socio-economic status. However, we lack general methods for estimating the causal effect of lexical choice on the perception of a specific sentence. While randomized controlled trials may provide good estimate…
▽ More
Studies across many disciplines have shown that lexical choice can affect audience perception. For example, how users describe themselves in a social media profile can affect their perceived socio-economic status. However, we lack general methods for estimating the causal effect of lexical choice on the perception of a specific sentence. While randomized controlled trials may provide good estimates, they do not scale to the potentially millions of comparisons necessary to consider all lexical choices. Instead, in this paper, we first offer two classes of methods to estimate the effect on perception of changing one word to another in a given sentence. The first class of algorithms builds upon quasi-experimental designs to estimate individual treatment effects from observational data. The second class treats treatment effect estimation as a classification problem. We conduct experiments with three data sources (Yelp, Twitter, and Airbnb), finding that the algorithmic estimates align well with those produced by randomized-control trials. Additionally, we find that it is possible to transfer treatment effect classifiers across domains and still maintain high accuracy.
△ Less
Submitted 14 November, 2018; v1 submitted 12 November, 2018;
originally announced November 2018.
-
Forecasting the presence and intensity of hostility on Instagram using linguistic and social features
Authors:
Ping Liu,
Joshua Guberman,
Libby Hemphill,
Aron Culotta
Abstract:
Online antisocial behavior, such as cyberbullying, harassment, and trolling, is a widespread problem that threatens free discussion and has negative physical and mental health consequences for victims and communities. While prior work has proposed automated methods to identify hostile comments in online discussions, these methods work retrospectively on comments that have already been posted, maki…
▽ More
Online antisocial behavior, such as cyberbullying, harassment, and trolling, is a widespread problem that threatens free discussion and has negative physical and mental health consequences for victims and communities. While prior work has proposed automated methods to identify hostile comments in online discussions, these methods work retrospectively on comments that have already been posted, making it difficult to intervene before an interaction escalates. In this paper we instead consider the problem of forecasting future hostilities in online discussions, which we decompose into two tasks: (1) given an initial sequence of non-hostile comments in a discussion, predict whether some future comment will contain hostility; and (2) given the first hostile comment in a discussion, predict whether this will lead to an escalation of hostility in subsequent comments. Thus, we aim to forecast both the presence and intensity of hostile comments based on linguistic and social features from earlier comments. To evaluate our approach, we introduce a corpus of over 30K annotated Instagram comments from over 1,100 posts. Our approach is able to predict the appearance of a hostile comment on an Instagram post ten or more hours in the future with an AUC of .82 (task 1), and can furthermore distinguish between high and low levels of future hostility with an AUC of .91 (task 2).
△ Less
Submitted 18 April, 2018;
originally announced April 2018.
-
Deceptiveness of internet data for disease surveillance
Authors:
Reid Priedhorsky,
Dave Osthus,
Ashlynn R. Daughton,
Kelly R. Moran,
Aron Culotta
Abstract:
Quantifying how many people are or will be sick, and where, is a critical ingredient in reducing the burden of disease because it helps the public health system plan and implement effective outbreak response. This process of disease surveillance is currently based on data gathering using clinical and laboratory methods; this distributed human contact and resulting bureaucratic data aggregation yie…
▽ More
Quantifying how many people are or will be sick, and where, is a critical ingredient in reducing the burden of disease because it helps the public health system plan and implement effective outbreak response. This process of disease surveillance is currently based on data gathering using clinical and laboratory methods; this distributed human contact and resulting bureaucratic data aggregation yield expensive procedures that lag real time by weeks or months. The promise of new surveillance approaches using internet data, such as web event logs or social media messages, is to achieve the same goal but faster and cheaper. However, prior work in this area lacks a rigorous model of information flow, making it difficult to assess the reliability of both specific approaches and the body of work as a whole.
We model disease surveillance as a Shannon communication. This new framework lets any two disease surveillance approaches be compared using a unified vocabulary and conceptual model. Using it, we describe and compare the deficiencies suffered by traditional and internet-based surveillance, introduce a new risk metric called deceptiveness, and offer mitigations for some of these deficiencies. This framework also makes the rich tools of information theory applicable to disease surveillance. This better understanding will improve the decision-making of public health practitioners by helping to leverage internet-based surveillance in a way complementary to the strengths of traditional surveillance.
△ Less
Submitted 31 July, 2018; v1 submitted 16 November, 2017;
originally announced November 2017.
-
Co-training for Demographic Classification Using Deep Learning from Label Proportions
Authors:
Ehsan Mohammady Ardehaly,
Aron Culotta
Abstract:
Deep learning algorithms have recently produced state-of-the-art accuracy in many classification tasks, but this success is typically dependent on access to many annotated training examples. For domains without such data, an attractive alternative is to train models with light, or distant supervision. In this paper, we introduce a deep neural network for the Learning from Label Proportion (LLP) se…
▽ More
Deep learning algorithms have recently produced state-of-the-art accuracy in many classification tasks, but this success is typically dependent on access to many annotated training examples. For domains without such data, an attractive alternative is to train models with light, or distant supervision. In this paper, we introduce a deep neural network for the Learning from Label Proportion (LLP) setting, in which the training data consist of bags of unlabeled instances with associated label distributions for each bag. We introduce a new regularization layer, Batch Averager, that can be appended to the last layer of any deep neural network to convert it from supervised learning to LLP. This layer can be implemented readily with existing deep learning packages. To further support domains in which the data consist of two conditionally independent feature views (e.g. image and text), we propose a co-training algorithm that iteratively generates pseudo bags and refits the deep LLP model to improve classification accuracy. We demonstrate our models on demographic attribute classification (gender and race/ethnicity), which has many applications in social media analysis, public health, and marketing. We conduct experiments to predict demographics of Twitter users based on their tweets and profile image, without requiring any user-level annotations for training. We find that the deep LLP approach outperforms baselines for both text and image features separately. Additionally, we find that co-training algorithm improves image and text classification by 4% and 8% absolute F1, respectively. Finally, an ensemble of text and image classifiers further improves the absolute F1 measure by 4% on average.
△ Less
Submitted 12 September, 2017;
originally announced September 2017.
-
Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions
Authors:
Ehsan Mohammady Ardehaly,
Aron Culotta
Abstract:
Opinion mining and demographic attribute inference have many applications in social science. In this paper, we propose models to infer daily joint probabilities of multiple latent attributes from Twitter data, such as political sentiment and demographic attributes. Since it is costly and time-consuming to annotate data for traditional supervised classification, we instead propose scalable Learning…
▽ More
Opinion mining and demographic attribute inference have many applications in social science. In this paper, we propose models to infer daily joint probabilities of multiple latent attributes from Twitter data, such as political sentiment and demographic attributes. Since it is costly and time-consuming to annotate data for traditional supervised classification, we instead propose scalable Learning from Label Proportions (LLP) models for demographic and opinion inference using U.S. Census, national and state political polls, and Cook partisan voting index as population level data. In LLP classification settings, the training data is divided into a set of unlabeled bags, where only the label distribution in of each bag is known, removing the requirement of instance-level annotations. Our proposed LLP model, Weighted Label Regularization (WLR), provides a scalable generalization of prior work on label regularization to support weights for samples inside bags, which is applicable in this setting where bags are arranged hierarchically (e.g., county-level bags are nested inside of state-level bags). We apply our model to Twitter data collected in the year leading up to the 2016 U.S. presidential election, producing estimates of the relationships among political sentiment and demographics over time and place. We find that our approach closely tracks traditional polling data stratified by demographic category, resulting in error reductions of 28-44% over baseline approaches. We also provide descriptive evaluations showing how the model may be used to estimate interactions among many variables and to identify linguistic temporal variation, capabilities which are typically not feasible using traditional polling methods.
△ Less
Submitted 26 August, 2017;
originally announced August 2017.
-
Controlling for Unobserved Confounds in Classification Using Correlational Constraints
Authors:
Virgile Landeiro,
Aron Culotta
Abstract:
As statistical classifiers become integrated into real-world applications, it is important to consider not only their accuracy but also their robustness to changes in the data distribution. In this paper, we consider the case where there is an unobserved confounding variable $z$ that influences both the features $\mathbf{x}$ and the class variable $y$. When the influence of $z$ changes from traini…
▽ More
As statistical classifiers become integrated into real-world applications, it is important to consider not only their accuracy but also their robustness to changes in the data distribution. In this paper, we consider the case where there is an unobserved confounding variable $z$ that influences both the features $\mathbf{x}$ and the class variable $y$. When the influence of $z$ changes from training to testing data, we find that the classifier accuracy can degrade rapidly. In our approach, we assume that we can predict the value of $z$ at training time with some error. The prediction for $z$ is then fed to Pearl's back-door adjustment to build our model. Because of the attenuation bias caused by measurement error in $z$, standard approaches to controlling for $z$ are ineffective. In response, we propose a method to properly control for the influence of $z$ by first estimating its relationship with the class variable $y$, then updating predictions for $z$ to match that estimated relationship. By adjusting the influence of $z$, we show that we can build a model that exceeds competing baselines on accuracy as well as on robustness over a range of confounding relationships.
△ Less
Submitted 11 January, 2018; v1 submitted 5 March, 2017;
originally announced March 2017.
-
Identifying leading indicators of product recalls from online reviews using positive unlabeled learning and domain adaptation
Authors:
Shreesh Kumara Bhat,
Aron Culotta
Abstract:
Consumer protection agencies are charged with safeguarding the public from hazardous products, but the thousands of products under their jurisdiction make it challenging to identify and respond to consumer complaints quickly. From the consumer's perspective, online reviews can provide evidence of product defects, but manually sifting through hundreds of reviews is not always feasible. In this pape…
▽ More
Consumer protection agencies are charged with safeguarding the public from hazardous products, but the thousands of products under their jurisdiction make it challenging to identify and respond to consumer complaints quickly. From the consumer's perspective, online reviews can provide evidence of product defects, but manually sifting through hundreds of reviews is not always feasible. In this paper, we propose a system to mine Amazon.com reviews to identify products that may pose safety or health hazards. Since labeled data for this task are scarce, our approach combines positive unlabeled learning with domain adaptation to train a classifier from consumer complaints submitted to the U.S. Consumer Product Safety Commission. On a validation set of manually annotated Amazon product reviews, we find that our approach results in an absolute F1 score improvement of 8% over the best competing baseline. Furthermore, we apply the classifier to Amazon reviews of known recalled products; the classifier identifies reviews reporting safety hazards prior to the recall date for 45% of the products. This suggests that the system may be able to provide an early warning system to alert consumers to hazardous products before an official recall is announced.
△ Less
Submitted 1 March, 2017;
originally announced March 2017.
-
Inferring the Origin Locations of Tweets with Quantitative Confidence
Authors:
Reid Priedhorsky,
Aron Culotta,
Sara Y. Del Valle
Abstract:
Social Internet content plays an increasingly critical role in many domains, including public health, disaster management, and politics. However, its utility is limited by missing geographic information; for example, fewer than 1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable, content-based approach to estimate the location of tweets using a novel yet simple variant of gau…
▽ More
Social Internet content plays an increasingly critical role in many domains, including public health, disaster management, and politics. However, its utility is limited by missing geographic information; for example, fewer than 1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable, content-based approach to estimate the location of tweets using a novel yet simple variant of gaussian mixture models. Further, because real-world applications depend on quantified uncertainty for such estimates, we propose novel metrics of accuracy, precision, and calibration, and we evaluate our approach accordingly. Experiments on 13 million global, comprehensively multi-lingual tweets show that our approach yields reliable, well-calibrated results competitive with previous computationally intensive methods. We also show that a relatively small number of training data are required for good estimates (roughly 30,000 tweets) and models are quite time-invariant (effective on tweets many weeks newer than the training set). Finally, we show that toponyms and languages with small geographic footprint provide the most useful location signals.
△ Less
Submitted 15 November, 2013; v1 submitted 16 May, 2013;
originally announced May 2013.
-
Detecting influenza outbreaks by analyzing Twitter messages
Authors:
Aron Culotta
Abstract:
We analyze over 500 million Twitter messages from an eight month period and find that tracking a small number of flu-related keywords allows us to forecast future influenza rates with high accuracy, obtaining a 95% correlation with national health statistics. We then analyze the robustness of this approach to spurious keyword matches, and we propose a document classification component to filter th…
▽ More
We analyze over 500 million Twitter messages from an eight month period and find that tracking a small number of flu-related keywords allows us to forecast future influenza rates with high accuracy, obtaining a 95% correlation with national health statistics. We then analyze the robustness of this approach to spurious keyword matches, and we propose a document classification component to filter these misleading messages. We find that this document classifier can reduce error rates by over half in simulated false alarm experiments, though more research is needed to develop methods that are robust in cases of extremely high noise.
△ Less
Submitted 27 July, 2010;
originally announced July 2010.