-
Divided by discipline? A systematic literature review on the quantification of online sexism and misogyny using a semi-automated approach
Authors:
Aditi Dutta,
Susan Banducci,
Chico Q. Camargo
Abstract:
Several computational tools have been developed to detect and identify sexism, misogyny, and gender-based hate speech, particularly on online platforms. These tools draw on insights from both social science and computer science. Given the increasing concern over gender-based discrimination in digital spaces, the contested definitions and measurements of sexism, and the rise of interdisciplinary ef…
▽ More
Several computational tools have been developed to detect and identify sexism, misogyny, and gender-based hate speech, particularly on online platforms. These tools draw on insights from both social science and computer science. Given the increasing concern over gender-based discrimination in digital spaces, the contested definitions and measurements of sexism, and the rise of interdisciplinary efforts to understand its online manifestations, a systematic literature review is essential for capturing the current state and trajectory of this evolving field. In this review, we make four key contributions: (1) we synthesize the literature into five core themes: definitions of sexism and misogyny, disciplinary divergences, automated detection methods, associated challenges, and design-based interventions; (2) we adopt an interdisciplinary lens, bridging theoretical and methodological divides across disciplines; (3) we highlight critical gaps, including the need for intersectional approaches, the under-representation of non-Western languages and perspectives, and the limited focus on proactive design strategies beyond text classification; and (4) we offer a methodological contribution by applying a rigorous semi-automated systematic review process guided by PRISMA, establishing a replicable standard for future work in this domain. Our findings reveal a clear disciplinary divide in how sexism and misogyny are conceptualized and measured. Through an evidence-based synthesis, we examine how existing studies have attempted to bridge this gap through interdisciplinary collaboration. Drawing on both social science theories and computational modeling practices, we assess the strengths and limitations of current methodologies. Finally, we outline key challenges and future directions for advancing research on the detection and mitigation of online sexism and misogyny.
△ Less
Submitted 16 May, 2025; v1 submitted 30 September, 2024;
originally announced September 2024.
-
Impact of Network Centrality and Income on Slowing Infection Spread after Outbreaks
Authors:
Shiv G. Yücel,
Rafael H. M. Pereira,
Pedro S. Peixoto,
Chico Q. Camargo
Abstract:
The COVID-19 pandemic has shed light on how the spread of infectious diseases worldwide are importantly shaped by both human mobility networks and socio-economic factors. Few studies, however, have examined the interaction of mobility networks with socio-spatial inequalities to understand the spread of infection. We introduce a novel methodology, called the Infection Delay Model, to calculate how…
▽ More
The COVID-19 pandemic has shed light on how the spread of infectious diseases worldwide are importantly shaped by both human mobility networks and socio-economic factors. Few studies, however, have examined the interaction of mobility networks with socio-spatial inequalities to understand the spread of infection. We introduce a novel methodology, called the Infection Delay Model, to calculate how the arrival time of an infection varies geographically, considering both effective distance-based metrics and differences in regions' capacity to isolate -- a feature associated with socioeconomic inequalities. To illustrate an application of the Infection Delay Model, this paper integrates household travel survey data with cell phone mobility data from the São Paulo metropolitan region to assess the effectiveness of lockdowns to slow the spread of COVID-19. Rather than operating under the assumption that the next pandemic will begin in the same region as the last, the model estimates infection delays under every possible outbreak scenario, allowing for generalizable insights into the effectiveness of interventions to delay a region's first case. The model sheds light on how the effectiveness of lockdowns to slow the spread of disease is influenced by the interaction of mobility networks and socio-economic levels. We find that a negative relationship emerges between network centrality and the infection delay after lockdown, irrespective of income. Furthermore, for regions across all income and centrality levels, outbreaks starting in less central locations were more effectively slowed by a lockdown. Using the Infection Delay Model, this paper identifies and quantifies a new dimension of disease risk faced by those most central in a mobility network.
△ Less
Submitted 8 February, 2022;
originally announced February 2022.
-
Estimating Traffic Disruption Patterns with Volunteered Geographic Information
Authors:
Chico Q. Camargo,
Jonathan Bright,
Graham McNeill,
Sridhar Raman,
Scott A. Hale
Abstract:
Accurate understanding and forecasting of traffic is a key contemporary problem for policymakers. Road networks are increasingly congested, yet traffic data is often expensive to obtain, making informed policy-making harder. This paper explores the extent to which traffic disruption can be estimated from static features from the volunteered geographic information site OpenStreetMap (OSM). We use O…
▽ More
Accurate understanding and forecasting of traffic is a key contemporary problem for policymakers. Road networks are increasingly congested, yet traffic data is often expensive to obtain, making informed policy-making harder. This paper explores the extent to which traffic disruption can be estimated from static features from the volunteered geographic information site OpenStreetMap (OSM). We use OSM features as predictors for linear regressions of counts of traffic disruptions and traffic volume at 6,500 points in the road network within 112 regions of Oxfordshire, UK. We show that more than half the variation in traffic volume and disruptions can be explained with static features alone, and use cross-validation and recursive feature elimination to evaluate the predictive power and importance of different land use categories. Finally, we show that using OSM's granular point of interest data allows for better predictions than the aggregate categories typically used in studies of transportation and land use.
△ Less
Submitted 11 July, 2019;
originally announced July 2019.
-
Diagnosing the performance of human mobility models at small spatial scales using volunteered geographic information
Authors:
Chico Q. Camargo,
Jonathan Bright,
Scott A. Hale
Abstract:
Accurate modelling of local population movement patterns is a core contemporary concern for urban policymakers, affecting both the short term deployment of public transport resources and the longer term planning of transport infrastructure. Yet, while macro-level population movement models (such as the gravity and radiation models) are well developed, micro-level alternatives are in much shorter s…
▽ More
Accurate modelling of local population movement patterns is a core contemporary concern for urban policymakers, affecting both the short term deployment of public transport resources and the longer term planning of transport infrastructure. Yet, while macro-level population movement models (such as the gravity and radiation models) are well developed, micro-level alternatives are in much shorter supply, with most macro-models known to perform badly in smaller geographic confines. In this paper we take a first step to remedying this deficit, by leveraging two novel datasets to analyse where and why macro-level models of human mobility break down at small scales. In particular, we use an anonymised aggregate dataset from a major mobility app and combine this with freely available data from OpenStreetMap concerning land-use composition of different areas around the county of Oxfordshire in the United Kingdom. We show where different models fail, and make the case for a new modelling strategy which moves beyond rough heuristics such as distance and population size towards a detailed, granular understanding of the opportunities presented in different areas of the city.
△ Less
Submitted 20 May, 2019;
originally announced May 2019.
-
Measuring the Volatility of the Political agenda in Public Opinion and News Media
Authors:
Chico Q. Camargo,
Scott A. Hale,
Peter John,
Helen Z. Margetts
Abstract:
Recent election surprises, regime changes, and political shocks indicate that political agendas have become more fast-moving and volatile. The ability to measure the complex dynamics of agenda change and capture the nature and extent of volatility in political systems is therefore more crucial than ever before. This study proposes a definition and operationalization of volatility that combines ins…
▽ More
Recent election surprises, regime changes, and political shocks indicate that political agendas have become more fast-moving and volatile. The ability to measure the complex dynamics of agenda change and capture the nature and extent of volatility in political systems is therefore more crucial than ever before. This study proposes a definition and operationalization of volatility that combines insights from political science, communications, information theory, and computational techniques. The proposed measures of fractionalization and agenda change encompass the shifting salience of issues in the agenda as a whole and allow the study of agendas across different domains. We evaluate these metrics and compare them to other measures such as issue-level survival rates and the Pedersen Index, which uses public-opinion poll data to measure public agendas, as well as traditional media content to measure media agendas in the UK and Germany. We show how these measures complement existing approaches and could be employed in future agenda-setting research.
△ Less
Submitted 19 September, 2021; v1 submitted 27 August, 2018;
originally announced August 2018.
-
Deep learning generalizes because the parameter-function map is biased towards simple functions
Authors:
Guillermo Valle-Pérez,
Chico Q. Camargo,
Ard A. Louis
Abstract:
Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly…
▽ More
Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly overfit. In this paper, we provide a new explanation. By applying a very general probability-complexity bound recently derived from algorithmic information theory (AIT), we argue that the parameter-function map of many DNNs should be exponentially biased towards simple functions. We then provide clear evidence for this strong simplicity bias in a model DNN for Boolean functions, as well as in much larger fully connected and convolutional networks applied to CIFAR10 and MNIST. As the target functions in many real problems are expected to be highly structured, this intrinsic simplicity bias helps explain why deep networks generalize well on real world problems. This picture also facilitates a novel PAC-Bayes approach where the prior is taken over the DNN input-output function space, rather than the more conventional prior over parameter space. If we assume that the training algorithm samples parameters close to uniformly within the zero-error region then the PAC-Bayes theorem can be used to guarantee good expected generalization for target functions producing high-likelihood training sets. By exploiting recently discovered connections between DNNs and Gaussian processes to estimate the marginal likelihood, we produce relatively tight generalization PAC-Bayes error bounds which correlate well with the true error on realistic datasets such as MNIST and CIFAR10 and for architectures including convolutional and fully connected networks.
△ Less
Submitted 21 April, 2019; v1 submitted 22 May, 2018;
originally announced May 2018.