Skip to main content

Showing 1–16 of 16 results for author: Culotta, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.20065  [pdf

    cs.DL cs.SI

    A Computational Analysis and Visualization of In-Text Reference Networks Across Philosophical Texts

    Authors: Robert Becker, Aron Culotta

    Abstract: We applied computational methods to analyze references across 2,245 philosophical texts, spanning from approximately 550 BCE to 1940 AD, in order to measure patterns in how philosophical ideas have spread over time. Using natural language processing and network analysis, we mapped over 294,970 references between authors, classifying each reference into subdisciplines of philosophy based on its sur… ▽ More

    Submitted 6 May, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: 57 pages, 41 figures, 3 tables. To submit to the Oxford Journal of the Digital Humanities

  2. arXiv:2204.01790  [pdf, other

    cs.SI cs.IR

    Leaders or Followers? A Temporal Analysis of Tweets from IRA Trolls

    Authors: Siva K. Balasubramanian, Mustafa Bilgic, Aron Culotta, Libby Hemphill, Anita Nikolich, Matthew A. Shapiro

    Abstract: The Internet Research Agency (IRA) influences online political conversations in the United States, exacerbating existing partisan divides and sowing discord. In this paper we investigate the IRA's communication strategies by analyzing trending terms on Twitter to identify cases in which the IRA leads or follows other users. Our analysis focuses on over 38M tweets posted between 2016 and 2017 from… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: ICWSM 2022

  3. arXiv:2110.00911  [pdf, other

    cs.LG cs.AI

    Enhancing Model Robustness and Fairness with Causality: A Regularization Approach

    Authors: Zhao Wang, Kai Shu, Aron Culotta

    Abstract: Recent work has raised concerns on the risk of spurious correlations and unintended biases in statistical machine learning models that threaten model robustness and fairness. In this paper, we propose a simple and intuitive regularization approach to integrate causal knowledge during model training and build a robust and fair model by emphasizing causal features and de-emphasizing spurious feature… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

  4. arXiv:2012.10040  [pdf, other

    cs.LG

    Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals

    Authors: Zhao Wang, Aron Culotta

    Abstract: Spurious correlations threaten the validity of statistical classifiers. While model accuracy may appear high when the test data is from the same distribution as the training data, it can quickly degrade when the test distribution changes. For example, it has been shown that classifiers perform poorly when humans make minor modifications to change the label of an example. One solution to increase m… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

  5. arXiv:2010.02466  [pdf, other

    cs.CL

    Are Words Commensurate with Actions? Quantifying Commitment to a Cause from Online Public Messaging

    Authors: Zhao Wang, Jennifer Cutler, Aron Culotta

    Abstract: Public entities such as companies and politicians increasingly use online social networks to communicate directly with their constituencies. Often, this public messaging is aimed at aligning the entity with a particular cause or issue, such as the environment or public health. However, as a consumer or voter, it can be difficult to assess an entity's true commitment to a cause based on public mess… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: In IEEE International Conference on Data Mining (ICDM) Workshop on Data science for Human Performance in Social Networks, 2017

  6. arXiv:2010.02458  [pdf, other

    cs.LG cs.CL cs.IR

    Identifying Spurious Correlations for Robust Text Classification

    Authors: Zhao Wang, Aron Culotta

    Abstract: The predictions of text classifiers are often driven by spurious correlations -- e.g., the term `Spielberg' correlates with positively reviewed movies, even though the term itself does not semantically convey a positive sentiment. In this paper, we propose a method to distinguish spurious and genuine correlations in text classification. We treat this as a supervised classification problem, using f… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: Findings of EMNLP-2020

    Journal ref: Findings of EMNLP-2020

  7. arXiv:1905.08418  [pdf, other

    cs.HC

    Personality and Behavior in Role-based Online Games

    Authors: Zhao Wang, Anna Sapienza, Aron Culotta, Emilio Ferrara

    Abstract: Both offline and online human behaviors are affected by personality. Of special interests are online games, where players have to impersonate specific roles and their behaviors are extensively tracked by the game. In this paper, we propose to study the relationship between players' personality and game behavior in League of Legends (LoL), one of the most popular Multiplayer Online Battle Arena (MO… ▽ More

    Submitted 20 May, 2019; originally announced May 2019.

  8. arXiv:1811.04890  [pdf, other

    cs.LG cs.CL stat.ML

    When do Words Matter? Understanding the Impact of Lexical Choice on Audience Perception using Individual Treatment Effect Estimation

    Authors: Zhao Wang, Aron Culotta

    Abstract: Studies across many disciplines have shown that lexical choice can affect audience perception. For example, how users describe themselves in a social media profile can affect their perceived socio-economic status. However, we lack general methods for estimating the causal effect of lexical choice on the perception of a specific sentence. While randomized controlled trials may provide good estimate… ▽ More

    Submitted 14 November, 2018; v1 submitted 12 November, 2018; originally announced November 2018.

    Comments: AAAI_2019

  9. arXiv:1804.06759  [pdf, other

    cs.CL cs.SI

    Forecasting the presence and intensity of hostility on Instagram using linguistic and social features

    Authors: Ping Liu, Joshua Guberman, Libby Hemphill, Aron Culotta

    Abstract: Online antisocial behavior, such as cyberbullying, harassment, and trolling, is a widespread problem that threatens free discussion and has negative physical and mental health consequences for victims and communities. While prior work has proposed automated methods to identify hostile comments in online discussions, these methods work retrospectively on comments that have already been posted, maki… ▽ More

    Submitted 18 April, 2018; originally announced April 2018.

    Comments: ICWSM'18

  10. arXiv:1711.06241  [pdf, other

    cs.IT cs.SI q-bio.PE stat.AP

    Deceptiveness of internet data for disease surveillance

    Authors: Reid Priedhorsky, Dave Osthus, Ashlynn R. Daughton, Kelly R. Moran, Aron Culotta

    Abstract: Quantifying how many people are or will be sick, and where, is a critical ingredient in reducing the burden of disease because it helps the public health system plan and implement effective outbreak response. This process of disease surveillance is currently based on data gathering using clinical and laboratory methods; this distributed human contact and resulting bureaucratic data aggregation yie… ▽ More

    Submitted 31 July, 2018; v1 submitted 16 November, 2017; originally announced November 2017.

    Comments: 26 pages, 6 figures

    Report number: LA-UR 17-24564 ACM Class: H.1.1; J.3; H.2.8; H.3.5

  11. arXiv:1709.04108  [pdf, other

    cs.CV cs.LG stat.ML

    Co-training for Demographic Classification Using Deep Learning from Label Proportions

    Authors: Ehsan Mohammady Ardehaly, Aron Culotta

    Abstract: Deep learning algorithms have recently produced state-of-the-art accuracy in many classification tasks, but this success is typically dependent on access to many annotated training examples. For domains without such data, an attractive alternative is to train models with light, or distant supervision. In this paper, we introduce a deep neural network for the Learning from Label Proportion (LLP) se… ▽ More

    Submitted 12 September, 2017; originally announced September 2017.

  12. Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions

    Authors: Ehsan Mohammady Ardehaly, Aron Culotta

    Abstract: Opinion mining and demographic attribute inference have many applications in social science. In this paper, we propose models to infer daily joint probabilities of multiple latent attributes from Twitter data, such as political sentiment and demographic attributes. Since it is costly and time-consuming to annotate data for traditional supervised classification, we instead propose scalable Learning… ▽ More

    Submitted 26 August, 2017; originally announced August 2017.

  13. arXiv:1703.01671  [pdf, other

    cs.AI cs.CL

    Controlling for Unobserved Confounds in Classification Using Correlational Constraints

    Authors: Virgile Landeiro, Aron Culotta

    Abstract: As statistical classifiers become integrated into real-world applications, it is important to consider not only their accuracy but also their robustness to changes in the data distribution. In this paper, we consider the case where there is an unobserved confounding variable $z$ that influences both the features $\mathbf{x}$ and the class variable $y$. When the influence of $z$ changes from traini… ▽ More

    Submitted 11 January, 2018; v1 submitted 5 March, 2017; originally announced March 2017.

    Comments: 9 pages

  14. arXiv:1703.00518  [pdf, other

    cs.IR cs.SI

    Identifying leading indicators of product recalls from online reviews using positive unlabeled learning and domain adaptation

    Authors: Shreesh Kumara Bhat, Aron Culotta

    Abstract: Consumer protection agencies are charged with safeguarding the public from hazardous products, but the thousands of products under their jurisdiction make it challenging to identify and respond to consumer complaints quickly. From the consumer's perspective, online reviews can provide evidence of product defects, but manually sifting through hundreds of reviews is not always feasible. In this pape… ▽ More

    Submitted 1 March, 2017; originally announced March 2017.

  15. arXiv:1305.3932  [pdf, other

    cs.SI cs.HC cs.LG

    Inferring the Origin Locations of Tweets with Quantitative Confidence

    Authors: Reid Priedhorsky, Aron Culotta, Sara Y. Del Valle

    Abstract: Social Internet content plays an increasingly critical role in many domains, including public health, disaster management, and politics. However, its utility is limited by missing geographic information; for example, fewer than 1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable, content-based approach to estimate the location of tweets using a novel yet simple variant of gau… ▽ More

    Submitted 15 November, 2013; v1 submitted 16 May, 2013; originally announced May 2013.

    Comments: 14 pages, 6 figures. Version 2: Move mathematics to appendix, 2 new references, various other presentation improvements. Version 3: Various presentation improvements, accepted at ACM CSCW 2014

    Report number: LA-UR 13-23557 ACM Class: D.2.8; H.3.5; I.2.6; I.2.7; K.4.1

  16. arXiv:1007.4748  [pdf, other

    cs.IR cs.CL

    Detecting influenza outbreaks by analyzing Twitter messages

    Authors: Aron Culotta

    Abstract: We analyze over 500 million Twitter messages from an eight month period and find that tracking a small number of flu-related keywords allows us to forecast future influenza rates with high accuracy, obtaining a 95% correlation with national health statistics. We then analyze the robustness of this approach to spurious keyword matches, and we propose a document classification component to filter th… ▽ More

    Submitted 27 July, 2010; originally announced July 2010.