-
Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets
Authors:
Erica Cai,
Sean McQuade,
Kevin Young,
Brendan O'Connor
Abstract:
When knowledge graphs (KGs) are automatically extracted from text, are they accurate enough for downstream analysis? Unfortunately, current annotated datasets can not be used to evaluate this question, since their KGs are highly disconnected, too small, or overly complex. To address this gap, we introduce AffilKG (https://doi.org/10.5281/zenodo.15427977), which is a collection of six datasets that…
▽ More
When knowledge graphs (KGs) are automatically extracted from text, are they accurate enough for downstream analysis? Unfortunately, current annotated datasets can not be used to evaluate this question, since their KGs are highly disconnected, too small, or overly complex. To address this gap, we introduce AffilKG (https://doi.org/10.5281/zenodo.15427977), which is a collection of six datasets that are the first to pair complete book scans with large, labeled knowledge graphs. Each dataset features affiliation graphs, which are simple KGs that capture Member relationships between Person and Organization entities -- useful in studies of migration, community interactions, and other social phenomena. In addition, three datasets include expanded KGs with a wider variety of relation types. Our preliminary experiments demonstrate significant variability in model performance across datasets, underscoring AffilKG's ability to enable two critical advances: (1) benchmarking how extraction errors propagate to graph-level analyses (e.g., community structure), and (2) validating KG extraction methods for real-world social science research.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Multilingualism, Transnationality, and K-pop in the Online #StopAsianHate Movement
Authors:
Tessa Masis,
Zhangqi Duan,
Weiai Wayne Xu,
Ethan Zuckerman,
Jane Yeahin Pyo,
Brendan O'Connor
Abstract:
The #StopAsianHate (SAH) movement is a broad social movement against violence targeting Asians and Asian Americans, beginning in 2021 in response to racial discrimination related to COVID-19 and sparking worldwide conversation about anti-Asian hate. However, research on the online SAH movement has focused on English-speaking participants so the spread of the movement outside of the United States i…
▽ More
The #StopAsianHate (SAH) movement is a broad social movement against violence targeting Asians and Asian Americans, beginning in 2021 in response to racial discrimination related to COVID-19 and sparking worldwide conversation about anti-Asian hate. However, research on the online SAH movement has focused on English-speaking participants so the spread of the movement outside of the United States is largely unknown. In addition, there have been no long-term studies of SAH so the extent to which it has been successfully sustained over time is not well understood. We present an analysis of 6.5 million "#StopAsianHate" tweets from 2.2 million users all over the globe and spanning 60 different languages, constituting the first study of the non-English and transnational component of the online SAH movement. Using a combination of topic modeling, user modeling, and hand annotation, we identify and characterize the dominant discussions and users participating in the movement and draw comparisons of English versus non-English topics and users. We discover clear differences in events driving topics, where spikes in English tweets are driven by violent crimes in the US but spikes in non-English tweets are driven by transnational incidents of anti-Asian sentiment towards symbolic representatives of Asian nations. We also find that global K-pop fans were quick to adopt the SAH movement and, in fact, sustained it for longer than any other user group. Our work contributes to understanding the transnationality and evolution of the SAH movement, and more generally to exploring upward scale shift and public attention in large-scale multilingual online activism.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
A Semantic Parsing Algorithm to Solve Linear Ordering Problems
Authors:
Maha Alkhairy,
Vincent Homer,
Brendan O'Connor
Abstract:
We develop an algorithm to semantically parse linear ordering problems, which require a model to arrange entities using deductive reasoning. Our method takes as input a number of premises and candidate statements, parsing them to a first-order logic of an ordering domain, and then utilizes constraint logic programming to infer the truth of proposed statements about the ordering.
Our semantic par…
▽ More
We develop an algorithm to semantically parse linear ordering problems, which require a model to arrange entities using deductive reasoning. Our method takes as input a number of premises and candidate statements, parsing them to a first-order logic of an ordering domain, and then utilizes constraint logic programming to infer the truth of proposed statements about the ordering.
Our semantic parser transforms Heim and Kratzer's syntax-based compositional formal semantic rules to a computational algorithm. This transformation involves introducing abstract types and templates based on their rules, and introduces a dynamic component to interpret entities within a contextual framework.
Our symbolic system, the Formal Semantic Logic Inferer (FSLI), is applied to answer multiple choice questions in BIG-bench's logical_deduction multiple choice problems, achieving perfect accuracy, compared to 67.06% for the best-performing LLM (GPT-4) and 87.63% for the hybrid system Logic-LM.
These promising results demonstrate the benefit of developing a semantic parsing algorithm driven by first-order logic constructs.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time
Authors:
Marisa Hudspeth,
Brendan O'Connor,
Laure Thompson
Abstract:
Existing Latin treebanks draw from Latin's long written tradition, spanning 17 centuries and a variety of cultures. Recent efforts have begun to harmonize these treebanks' annotations to better train and evaluate morphological taggers. However, the heterogeneity of these treebanks must be carefully considered to build effective and reliable data. In this work, we review existing Latin treebanks to…
▽ More
Existing Latin treebanks draw from Latin's long written tradition, spanning 17 centuries and a variety of cultures. Recent efforts have begun to harmonize these treebanks' annotations to better train and evaluate morphological taggers. However, the heterogeneity of these treebanks must be carefully considered to build effective and reliable data. In this work, we review existing Latin treebanks to identify the texts they draw from, identify their overlap, and document their coverage across time and genre. We additionally design automated conversions of their morphological feature annotations into the conventions of standard Latin grammar. From this, we build new time-period data splits that draw from the existing treebanks which we use to perform a broad cross-time analysis for POS and morphological feature tagging. We find that BERT-based taggers outperform existing taggers while also being more robust to cross-domain shifts.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input
Authors:
Tessa Masis,
Brendan O'Connor
Abstract:
Geo-entity linking is the task of linking a location mention to the real-world geographic location. In this paper we explore the challenging task of geo-entity linking for noisy, multilingual social media data. There are few open-source multilingual geo-entity linking tools available and existing ones are often rule-based, which break easily in social media settings, or LLM-based, which are too ex…
▽ More
Geo-entity linking is the task of linking a location mention to the real-world geographic location. In this paper we explore the challenging task of geo-entity linking for noisy, multilingual social media data. There are few open-source multilingual geo-entity linking tools available and existing ones are often rule-based, which break easily in social media settings, or LLM-based, which are too expensive for large-scale datasets. We present a method which represents real-world locations as averaged embeddings from labeled user-input location names and allows for selective prediction via an interpretable confidence score. We show that our approach improves geo-entity linking on a global and multilingual social media dataset, and discuss progress and problems with evaluating at different geographic granularities.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction
Authors:
Erica Cai,
Brendan O'Connor
Abstract:
Current social science efforts automatically populate event databases of "who did what to whom?" tuples, by applying event extraction (EE) to text such as news. The event databases are used to analyze sociopolitical dynamics between actor pairs (dyads) in, e.g., international relations. While most EE methods heavily rely on rules or supervised learning, \emph{zero-shot} event extraction could pote…
▽ More
Current social science efforts automatically populate event databases of "who did what to whom?" tuples, by applying event extraction (EE) to text such as news. The event databases are used to analyze sociopolitical dynamics between actor pairs (dyads) in, e.g., international relations. While most EE methods heavily rely on rules or supervised learning, \emph{zero-shot} event extraction could potentially allow researchers to flexibly specify arbitrary event classes for new research questions. Unfortunately, we find that current zero-shot EE methods, as well as a naive zero-shot approach of simple generative language model (LM) prompting, perform poorly for dyadic event extraction; most suffer from word sense ambiguity, modality sensitivity, and computational inefficiency. We address these challenges with a new fine-grained, multi-stage instruction-following generative LM pipeline, proposing a Monte Carlo approach to deal with, and even take advantage of, nondeterminism of generative outputs. Our pipeline includes explicit stages of linguistic analysis (synonym generation, contextual disambiguation, argument realization, event modality), \textit{improving control and interpretability} compared to purely neural methods. This method outperforms other zero-shot EE approaches, and outperforms naive applications of generative LMs by at least 17 F1 percent points. The pipeline's filtering mechanism greatly improves computational efficiency, allowing it to perform as few as 12% of queries that a previous zero-shot method uses. Finally, we demonstrate our pipeline's application to dyadic international relations analysis.
△ Less
Submitted 2 June, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion
Authors:
Brendan O'Connor,
Simon Dixon
Abstract:
Previous research has shown that established techniques for spoken voice conversion (VC) do not perform as well when applied to singing voice conversion (SVC). We propose an alternative loss component in a loss function that is otherwise well-established among VC tasks, which has been shown to improve our model's SVC performance. We first trained a singer identity embedding (SIE) network on mel-sp…
▽ More
Previous research has shown that established techniques for spoken voice conversion (VC) do not perform as well when applied to singing voice conversion (SVC). We propose an alternative loss component in a loss function that is otherwise well-established among VC tasks, which has been shown to improve our model's SVC performance. We first trained a singer identity embedding (SIE) network on mel-spectrograms of singer recordings to produce singer-specific variance encodings using contrastive learning. We subsequently trained a well-known autoencoder framework (AutoVC) conditioned on these SIEs, and measured differences in SVC performance when using different latent regressor loss components. We found that using this loss w.r.t. SIEs leads to better performance than w.r.t. bottleneck embeddings, where converted audio is more natural and specific towards target singers. The inclusion of this loss component has the advantage of explicitly forcing the network to reconstruct with timbral similarity, and also negates the effect of poor disentanglement in AutoVC's bottleneck embeddings. We demonstrate peculiar diversity between computational and human evaluations on singer-converted audio clips, which highlights the necessity of both. We also propose a pitch-matching mechanism between source and target singers to ensure these evaluations are not influenced by differences in pitch register.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
Examining Political Rhetoric with Epistemic Stance Detection
Authors:
Ankita Gupta,
Su Lin Blodgett,
Justin H Gross,
Brendan O'Connor
Abstract:
Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance predict…
▽ More
Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance prediction, drawn from research in computational semantics, to distinguish at the clausal level what is asserted, denied, or only ambivalently suggested by the author or other mentioned entities (belief holders). We first develop a simple RoBERTa-based model for multi-source stance predictions that outperforms more complex state-of-the-art modeling. Then we demonstrate its novel application to political science by conducting a large-scale analysis of the Mass Market Manifestos corpus of U.S. political opinion books, where we characterize trends in cited belief holders -- respected allies and opposed bogeymen -- across U.S. political ideologies.
△ Less
Submitted 5 January, 2023; v1 submitted 29 December, 2022;
originally announced December 2022.
-
ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution
Authors:
Ankita Gupta,
Marzena Karpinska,
Wenlong Zhao,
Kalpesh Krishna,
Jack Merullo,
Luke Yeh,
Mohit Iyyer,
Brendan O'Connor
Abstract:
Large-scale, high-quality corpora are critical for advancing research in coreference resolution. However, existing datasets vary in their definition of coreferences and have been collected via complex and lengthy guidelines that are curated for linguistic experts. These concerns have sparked a growing interest among researchers to curate a unified set of guidelines suitable for annotators with var…
▽ More
Large-scale, high-quality corpora are critical for advancing research in coreference resolution. However, existing datasets vary in their definition of coreferences and have been collected via complex and lengthy guidelines that are curated for linguistic experts. These concerns have sparked a growing interest among researchers to curate a unified set of guidelines suitable for annotators with various backgrounds. In this work, we develop a crowdsourcing-friendly coreference annotation methodology, ezCoref, consisting of an annotation tool and an interactive tutorial. We use ezCoref to re-annotate 240 passages from seven existing English coreference datasets (spanning fiction, news, and multiple other domains) while teaching annotators only cases that are treated similarly across these datasets. Surprisingly, we find that reasonable quality annotations were already achievable (>90% agreement between the crowd and expert annotations) even without extensive training. On carefully analyzing the remaining disagreements, we identify the presence of linguistic cases that our annotators unanimously agree upon but lack unified treatments (e.g., generic pronouns, appositives) in existing datasets. We propose the research community should revisit these phenomena when curating future unified annotation guidelines.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
Tone prediction and orthographic conversion for Basaa
Authors:
Ilya Nikitin,
Brian O'Connor,
Anastasia Safonova
Abstract:
In this paper, we present a seq2seq approach for transliterating missionary Basaa orthographies into the official orthography. Our model uses pre-trained Basaa missionary and official orthography corpora using BERT. Since Basaa is a low-resource language, we have decided to use the mT5 model for our project. Before training our model, we pre-processed our corpora by eliminating one-to-one correspo…
▽ More
In this paper, we present a seq2seq approach for transliterating missionary Basaa orthographies into the official orthography. Our model uses pre-trained Basaa missionary and official orthography corpora using BERT. Since Basaa is a low-resource language, we have decided to use the mT5 model for our project. Before training our model, we pre-processed our corpora by eliminating one-to-one correspondences between spellings and unifying characters variably containing either one to two characters into single-character form. Our best mT5 model achieved a CER equal to 12.6747 and a WER equal to 40.1012.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties
Authors:
Tessa Masis,
Anissa Neal,
Lisa Green,
Brendan O'Connor
Abstract:
The study of language variation examines how language varies between and within different groups of speakers, shedding light on how we use language to construct identities and how social contexts affect language use. A common method is to identify instances of a certain linguistic feature - say, the zero copula construction - in a corpus, and analyze the feature's distribution across speakers, top…
▽ More
The study of language variation examines how language varies between and within different groups of speakers, shedding light on how we use language to construct identities and how social contexts affect language use. A common method is to identify instances of a certain linguistic feature - say, the zero copula construction - in a corpus, and analyze the feature's distribution across speakers, topics, and other variables, to either gain a qualitative understanding of the feature's function or systematically measure variation. In this paper, we explore the challenging task of automatic morphosyntactic feature detection in low-resource English varieties. We present a human-in-the-loop approach to generate and filter effective contrast sets via corpus-guided edits. We show that our approach improves feature detection for both Indian English and African American English, demonstrate how it can assist linguistic research, and release our fine-tuned models for use by other researchers.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
A New Method for the High-Precision Assessment of Tumor Changes in Response to Treatment
Authors:
P. D. Tar,
N. A. Thacker,
J. P. B. O'Connor
Abstract:
Imaging demonstrates that preclinical and human tumors are heterogeneous, i.e. a single tumor can exhibit multiple regions that behave differently during both normal development and also in response to treatment. The large variations observed in control group tumors can obscure detection of significant therapeutic effects due to the ambiguity in attributing causes of change. This can hinder develo…
▽ More
Imaging demonstrates that preclinical and human tumors are heterogeneous, i.e. a single tumor can exhibit multiple regions that behave differently during both normal development and also in response to treatment. The large variations observed in control group tumors can obscure detection of significant therapeutic effects due to the ambiguity in attributing causes of change. This can hinder development of effective therapies due to limitations in experimental design, rather than due to therapeutic failure. An improved method to model biological variation and heterogeneity in imaging signals is described. Specifically, Linear Poisson modelling (LPM) evaluates changes in apparent diffusion co-efficient (ADC) before and 72 hours after radiotherapy, in two xenograft models of colorectal cancer. The statistical significance of measured changes are compared to those attainable using a conventional t-test analysis on basic ADC distribution parameters. When LPMs were applied to treated tumors, the LPMs detected highly significant changes. The analyses were significant for all tumors, equating to a gain in power of 4 fold (i.e. equivelent to having a sample size 16 times larger), compared with the conventional approach. In contrast, highly significant changes are only detected at a cohort level using t-tests, restricting their potential use within personalised medicine and increasing the number of animals required during testing. Furthermore, LPM enabled the relative volumes of responding and non-responding tissue to be estimated for each xenograft model. Leave-one-out analysis of the treated xenografts provided quality control and identified potential outliers, raising confidence in LPM data at clinically relevant sample sizes.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
ClioQuery: Interactive Query-Oriented Text Analytics for Comprehensive Investigation of Historical News Archives
Authors:
Abram Handler,
Narges Mahyar,
Brendan O'Connor
Abstract:
Historians and archivists often find and analyze the occurrences of query words in newspaper archives, to help answer fundamental questions about society. But much work in text analytics focuses on helping people investigate other textual units, such as events, clusters, ranked documents, entity relationships, or thematic hierarchies. Informed by a study into the needs of historians and archivists…
▽ More
Historians and archivists often find and analyze the occurrences of query words in newspaper archives, to help answer fundamental questions about society. But much work in text analytics focuses on helping people investigate other textual units, such as events, clusters, ranked documents, entity relationships, or thematic hierarchies. Informed by a study into the needs of historians and archivists, we thus propose ClioQuery, a text analytics system uniquely organized around the analysis of query words in context. ClioQuery applies text simplification techniques from natural language processing to help historians quickly and comprehensively gather and analyze all occurrences of a query word across an archive. It also pairs these new NLP methods with more traditional features like linked views and in-text highlighting to help engender trust in summarization techniques. We evaluate ClioQuery with two separate user studies, in which historians explain how ClioQuery's novel text simplification features can help facilitate historical research. We also evaluate with a separate quantitative comparison study, which shows that ClioQuery helps crowdworkers find and remember historical information. Such results suggest possible new directions for text analytics in other query-oriented settings.
△ Less
Submitted 10 April, 2022;
originally announced April 2022.
-
A Framework for the Interoperability of Cloud Platforms: Towards FAIR Data in SAFE Environments
Authors:
Robert L. Grossman,
Rebecca R. Boyles,
Brandi N. Davis-Dusenbery,
Amanda Haddock,
Allison P. Heath,
Brian D. O'Connor,
Adam C. Resnick,
Deanne M. Taylor,
Stan Ahalt
Abstract:
As the number of cloud platforms supporting scientific research grows, there is an increasing need to support interoperability between two or more cloud platforms, as a growing amount of data is being hosted in cloud-based platforms. A well accepted core concept is to make data in cloud platforms Findable, Accessible, Interoperable and Reusable (FAIR). We introduce a companion concept that applies…
▽ More
As the number of cloud platforms supporting scientific research grows, there is an increasing need to support interoperability between two or more cloud platforms, as a growing amount of data is being hosted in cloud-based platforms. A well accepted core concept is to make data in cloud platforms Findable, Accessible, Interoperable and Reusable (FAIR). We introduce a companion concept that applies to cloud-based computing environments that we call a Secure and Authorized FAIR Environment (SAFE). SAFE environments require data and platform governance structures and are designed to support the interoperability of sensitive or controlled access data, such as biomedical data. A SAFE environment is a cloud platform that has been approved through a defined data and platform governance process as authorized to hold data from another cloud platform and exposes appropriate APIs for the two platforms to interoperate.
△ Less
Submitted 15 February, 2024; v1 submitted 9 March, 2022;
originally announced March 2022.
-
Zero-shot Singing Technique Conversion
Authors:
Brendan O'Connor,
Simon Dixon,
George Fazekas
Abstract:
In this paper we propose modifications to the neural network framework, AutoVC for the task of singing technique conversion. This includes utilising a pretrained singing technique encoder which extracts technique information, upon which a decoder is conditioned during training. By swapping out a source singer's technique information for that of the target's during conversion, the input spectrogram…
▽ More
In this paper we propose modifications to the neural network framework, AutoVC for the task of singing technique conversion. This includes utilising a pretrained singing technique encoder which extracts technique information, upon which a decoder is conditioned during training. By swapping out a source singer's technique information for that of the target's during conversion, the input spectrogram is reconstructed with the target's technique. We document the beneficial effects of omitting the latent loss, the importance of sequential training, and our process for fine-tuning the bottleneck. We also conducted a listening study where participants rate the specificity of technique-converted voices as well as their naturalness. From this we are able to conclude how effective the technique conversions are and how different conditions affect them, while assessing the model's ability to reconstruct its input data.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
An Exploratory Study on Perceptual Spaces of the Singing Voice
Authors:
Brendan O'Connor,
Simon Dixon,
George Fazekas
Abstract:
Sixty participants provided dissimilarity ratings between various singing techniques. Multidimensional scaling, class averaging and clustering techniques were used to analyse timbral spaces and how they change between different singers, genders and registers. Clustering analysis showed that ground-truth similarity and silhouette scores that were not significantly different between gender or regist…
▽ More
Sixty participants provided dissimilarity ratings between various singing techniques. Multidimensional scaling, class averaging and clustering techniques were used to analyse timbral spaces and how they change between different singers, genders and registers. Clustering analysis showed that ground-truth similarity and silhouette scores that were not significantly different between gender or register conditions, while similarity scores were positively correlated with participants' instrumental abilities and task comprehension. Participant feedback showed how a revised study design might mitigate noise in our data, leading to more detailed statistical results. Timbre maps and class distance analysis showed us which singing techniques remained similar to one another across gender and register conditions. This research provides insight into how the timbre space of singing changes under different conditions, highlights the subjectivity of perception between participants, and provides generalised timbre maps for regularisation in machine learning.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Text as Causal Mediators: Research Design for Causal Estimates of Differential Treatment of Social Groups via Language Aspects
Authors:
Katherine A. Keith,
Douglas Rice,
Brendan O'Connor
Abstract:
Using observed language to understand interpersonal interactions is important in high-stakes decision making. We propose a causal research design for observational (non-experimental) data to estimate the natural direct and indirect effects of social group signals (e.g. race or gender) on speakers' responses with separate aspects of language as causal mediators. We illustrate the promises and chall…
▽ More
Using observed language to understand interpersonal interactions is important in high-stakes decision making. We propose a causal research design for observational (non-experimental) data to estimate the natural direct and indirect effects of social group signals (e.g. race or gender) on speakers' responses with separate aspects of language as causal mediators. We illustrate the promises and challenges of this framework via a theoretical case study of the effect of an advocate's gender on interruptions from justices during U.S. Supreme Court oral arguments. We also discuss challenges conceptualizing and operationalizing causal variables such as gender and language that comprise of many components, and we articulate technical open challenges such as temporal dependence between language mediators in conversational settings.
△ Less
Submitted 15 September, 2021;
originally announced September 2021.
-
Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence
Authors:
Andrew Halterman,
Katherine A. Keith,
Sheikh Muhammad Sarwar,
Brendan O'Connor
Abstract:
Automated event extraction in social science applications often requires corpus-level evaluations: for example, aggregating text predictions across metadata and unbiased estimates of recall. We combine corpus-level evaluation requirements with a real-world, social science setting and introduce the IndiaPoliceEvents corpus--all 21,391 sentences from 1,257 English-language Times of India articles ab…
▽ More
Automated event extraction in social science applications often requires corpus-level evaluations: for example, aggregating text predictions across metadata and unbiased estimates of recall. We combine corpus-level evaluation requirements with a real-world, social science setting and introduce the IndiaPoliceEvents corpus--all 21,391 sentences from 1,257 English-language Times of India articles about events in the state of Gujarat during March 2002. Our trained annotators read and label every document for mentions of police activity events, allowing for unbiased recall evaluations. In contrast to other datasets with structured event representations, we gather annotations by posing natural questions, and evaluate off-the-shelf models for three different tasks: sentence classification, document ranking, and temporal aggregation of target events. We present baseline results from zero-shot BERT-based models fine-tuned on natural language inference and passage retrieval tasks. Our novel corpus-level evaluations and annotation approach can guide creation of similar social-science-oriented resources in the future.
△ Less
Submitted 27 May, 2021;
originally announced May 2021.
-
Analyzing Gender Bias within Narrative Tropes
Authors:
Dhruvil Gala,
Mohammad Omar Khursheed,
Hannah Lerner,
Brendan O'Connor,
Mohit Iyyer
Abstract:
Popular media reflects and reinforces societal biases through the use of tropes, which are narrative elements, such as archetypal characters and plot arcs, that occur frequently across media. In this paper, we specifically investigate gender bias within a large collection of tropes. To enable our study, we crawl tvtropes.org, an online user-created repository that contains 30K tropes associated wi…
▽ More
Popular media reflects and reinforces societal biases through the use of tropes, which are narrative elements, such as archetypal characters and plot arcs, that occur frequently across media. In this paper, we specifically investigate gender bias within a large collection of tropes. To enable our study, we crawl tvtropes.org, an online user-created repository that contains 30K tropes associated with 1.9M examples of their occurrences across film, television, and literature. We automatically score the "genderedness" of each trope in our TVTROPES dataset, which enables an analysis of (1) highly-gendered topics within tropes, (2) the relationship between gender bias and popular reception, and (3) how the gender of a work's creator correlates with the types of tropes that they use.
△ Less
Submitted 30 October, 2020;
originally announced November 2020.
-
Uncertainty over Uncertainty: Investigating the Assumptions, Annotations, and Text Measurements of Economic Policy Uncertainty
Authors:
Katherine A. Keith,
Christoph Teichmann,
Brendan O'Connor,
Edgar Meij
Abstract:
Methods and applications are inextricably linked in science, and in particular in the domain of text-as-data. In this paper, we examine one such text-as-data application, an established economic index that measures economic policy uncertainty from keyword occurrences in news. This index, which is shown to correlate with firm investment, employment, and excess market returns, has had substantive im…
▽ More
Methods and applications are inextricably linked in science, and in particular in the domain of text-as-data. In this paper, we examine one such text-as-data application, an established economic index that measures economic policy uncertainty from keyword occurrences in news. This index, which is shown to correlate with firm investment, employment, and excess market returns, has had substantive impact in both the private sector and academia. Yet, as we revisit and extend the original authors' annotations and text measurements we find interesting text-as-data methodological research questions: (1) Are annotator disagreements a reflection of ambiguity in language? (2) Do alternative text measurements correlate with one another and with measures of external predictive validity? We find for this application (1) some annotator disagreements of economic policy uncertainty can be attributed to ambiguity in language, and (2) switching measurements from keyword-matching to supervised machine learning classifiers results in low correlation, a concerning implication for the validity of the index.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates
Authors:
Katherine A. Keith,
David Jensen,
Brendan O'Connor
Abstract:
Many applications of computational social science aim to infer causal conclusions from non-experimental data. Such observational data often contains confounders, variables that influence both potential causes and potential effects. Unmeasured or latent confounders can bias causal estimates, and this has motivated interest in measuring potential confounders from observed text. For example, an indiv…
▽ More
Many applications of computational social science aim to infer causal conclusions from non-experimental data. Such observational data often contains confounders, variables that influence both potential causes and potential effects. Unmeasured or latent confounders can bias causal estimates, and this has motivated interest in measuring potential confounders from observed text. For example, an individual's entire history of social media posts or the content of a news article could provide a rich measurement of multiple confounders. Yet, methods and applications for this problem are scattered across different communities and evaluation practices are inconsistent. This review is the first to gather and categorize these examples and provide a guide to data-processing and evaluation decisions. Despite increased attention on adjusting for confounding using text, there are still many open problems, which we highlight in this paper.
△ Less
Submitted 1 May, 2020;
originally announced May 2020.
-
Investigating Sports Commentator Bias within a Large Corpus of American Football Broadcasts
Authors:
Jack Merullo,
Luke Yeh,
Abram Handler,
Alvin Grissom II,
Brendan O'Connor,
Mohit Iyyer
Abstract:
Sports broadcasters inject drama into play-by-play commentary by building team and player narratives through subjective analyses and anecdotes. Prior studies based on small datasets and manual coding show that such theatrics evince commentator bias in sports broadcasts. To examine this phenomenon, we assemble FOOTBALL, which contains 1,455 broadcast transcripts from American football games across…
▽ More
Sports broadcasters inject drama into play-by-play commentary by building team and player narratives through subjective analyses and anecdotes. Prior studies based on small datasets and manual coding show that such theatrics evince commentator bias in sports broadcasts. To examine this phenomenon, we assemble FOOTBALL, which contains 1,455 broadcast transcripts from American football games across six decades that are automatically annotated with 250K player mentions and linked with racial metadata. We identify major confounding factors for researchers examining racial bias in FOOTBALL, and perform a computational analysis that supports conclusions from prior social science studies.
△ Less
Submitted 18 October, 2019; v1 submitted 7 September, 2019;
originally announced September 2019.
-
Query-focused Sentence Compression in Linear Time
Authors:
Abram Handler,
Brendan O'Connor
Abstract:
Search applications often display shortened sentences which must contain certain query terms and must fit within the space constraints of a user interface. This work introduces a new transition-based sentence compression technique developed for such settings. Our query-focused method constructs length and lexically constrained compressions in linear time, by growing a subgraph in the dependency pa…
▽ More
Search applications often display shortened sentences which must contain certain query terms and must fit within the space constraints of a user interface. This work introduces a new transition-based sentence compression technique developed for such settings. Our query-focused method constructs length and lexically constrained compressions in linear time, by growing a subgraph in the dependency parse of a sentence. This theoretically efficient approach achieves an 11X empirical speedup over baseline ILP methods, while better reconstructing gold constrained shortenings. Such speedups help query-focused applications, because users are measurably hindered by interface lags. Additionally, our technique does not require an ILP solver or a GPU.
△ Less
Submitted 17 September, 2019; v1 submitted 18 April, 2019;
originally announced April 2019.
-
Human acceptability judgements for extractive sentence compression
Authors:
Abram Handler,
Brian Dillon,
Brendan O'Connor
Abstract:
Recent approaches to English-language sentence compression rely on parallel corpora consisting of sentence-compression pairs. However, a sentence may be shortened in many different ways, which each might be suited to the needs of a particular application. Therefore, in this work, we collect and model crowdsourced judgements of the acceptability of many possible sentence shortenings. We then show h…
▽ More
Recent approaches to English-language sentence compression rely on parallel corpora consisting of sentence-compression pairs. However, a sentence may be shortened in many different ways, which each might be suited to the needs of a particular application. Therefore, in this work, we collect and model crowdsourced judgements of the acceptability of many possible sentence shortenings. We then show how a model of such judgements can be used to support a flexible approach to the compression task. We release our model and dataset for future work.
△ Less
Submitted 1 February, 2019;
originally announced February 2019.
-
Evaluating Syntactic Properties of Seq2seq Output with a Broad Coverage HPSG: A Case Study on Machine Translation
Authors:
Johnny Tian-Zheng Wei,
Khiem Pham,
Brian Dillon,
Brendan O'Connor
Abstract:
Sequence to sequence (seq2seq) models are often employed in settings where the target output is natural language. However, the syntactic properties of the language generated from these models are not well understood. We explore whether such output belongs to a formal and realistic grammar, by employing the English Resource Grammar (ERG), a broad coverage, linguistically precise HPSG-based grammar…
▽ More
Sequence to sequence (seq2seq) models are often employed in settings where the target output is natural language. However, the syntactic properties of the language generated from these models are not well understood. We explore whether such output belongs to a formal and realistic grammar, by employing the English Resource Grammar (ERG), a broad coverage, linguistically precise HPSG-based grammar of English. From a French to English parallel corpus, we analyze the parseability and grammatical constructions occurring in output from a seq2seq translation model. Over 93\% of the model translations are parseable, suggesting that it learns to generate conforming to a grammar. The model has trouble learning the distribution of rarer syntactic rules, and we pinpoint several constructions that differentiate translations between the references and our model.
△ Less
Submitted 6 September, 2018;
originally announced September 2018.
-
Monte Carlo Syntax Marginals for Exploring and Using Dependency Parses
Authors:
Katherine A. Keith,
Su Lin Blodgett,
Brendan O'Connor
Abstract:
Dependency parsing research, which has made significant gains in recent years, typically focuses on improving the accuracy of single-tree predictions. However, ambiguity is inherent to natural language syntax, and communicating such ambiguity is important for error analysis and better-informed downstream applications. In this work, we propose a transition sampling algorithm to sample from the full…
▽ More
Dependency parsing research, which has made significant gains in recent years, typically focuses on improving the accuracy of single-tree predictions. However, ambiguity is inherent to natural language syntax, and communicating such ambiguity is important for error analysis and better-informed downstream applications. In this work, we propose a transition sampling algorithm to sample from the full joint distribution of parse trees defined by a transition-based parsing model, and demonstrate the use of the samples in probabilistic dependency analysis. First, we define the new task of dependency path prediction, inferring syntactic substructures over part of a sentence, and provide the first analysis of performance on this task. Second, we demonstrate the usefulness of our Monte Carlo syntax marginal method for parser error analysis and calibration. Finally, we use this method to propagate parse uncertainty to two downstream information extraction applications: identifying persons killed by police and semantic role assignment.
△ Less
Submitted 16 April, 2018;
originally announced April 2018.
-
Rookie: A unique approach for exploring news archives
Authors:
Abram Handler,
Brendan O'Connor
Abstract:
News archives are an invaluable primary source for placing current events in historical context. But current search engine tools do a poor job at uncovering broad themes and narratives across documents. We present Rookie: a practical software system which uses natural language processing (NLP) to help readers, reporters and editors uncover broad stories in news archives. Unlike prior work, Rookie'…
▽ More
News archives are an invaluable primary source for placing current events in historical context. But current search engine tools do a poor job at uncovering broad themes and narratives across documents. We present Rookie: a practical software system which uses natural language processing (NLP) to help readers, reporters and editors uncover broad stories in news archives. Unlike prior work, Rookie's design emerged from 18 months of iterative development in consultation with editors and computational journalists. This process lead to a dramatically different approach from previous academic systems with similar goals. Our efforts offer a generalizable case study for others building real-world journalism software using NLP.
△ Less
Submitted 6 August, 2017;
originally announced August 2017.
-
Identifying civilians killed by police with distantly supervised entity-event extraction
Authors:
Katherine A. Keith,
Abram Handler,
Michael Pinkham,
Cara Magliozzi,
Joshua McDuffie,
Brendan O'Connor
Abstract:
We propose a new, socially-impactful task for natural language processing: from a news corpus, extract names of persons who have been killed by police. We present a newly collected police fatality corpus, which we release publicly, and present a model to solve this problem that uses EM-based distant supervision with logistic regression and convolutional neural network classifiers. Our model outper…
▽ More
We propose a new, socially-impactful task for natural language processing: from a news corpus, extract names of persons who have been killed by police. We present a newly collected police fatality corpus, which we release publicly, and present a model to solve this problem that uses EM-based distant supervision with logistic regression and convolutional neural network classifiers. Our model outperforms two off-the-shelf event extractor systems, and it can suggest candidate victim names in some cases faster than one of the major manually-collected police fatality databases.
△ Less
Submitted 21 July, 2017;
originally announced July 2017.
-
Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English
Authors:
Su Lin Blodgett,
Brendan O'Connor
Abstract:
We highlight an important frontier in algorithmic fairness: disparity in the quality of natural language processing algorithms when applied to language from authors of different social groups. For example, current systems sometimes analyze the language of females and minorities more poorly than they do of whites and males. We conduct an empirical analysis of racial disparity in language identifica…
▽ More
We highlight an important frontier in algorithmic fairness: disparity in the quality of natural language processing algorithms when applied to language from authors of different social groups. For example, current systems sometimes analyze the language of females and minorities more poorly than they do of whites and males. We conduct an empirical analysis of racial disparity in language identification for tweets written in African-American English, and discuss implications of disparity in NLP.
△ Less
Submitted 30 June, 2017;
originally announced July 2017.
-
Demographic Dialectal Variation in Social Media: A Case Study of African-American English
Authors:
Su Lin Blodgett,
Lisa Green,
Brendan O'Connor
Abstract:
Though dialectal language is increasingly abundant on social media, few resources exist for developing NLP tools to handle such language. We conduct a case study of dialectal language in online conversational text by investigating African-American English (AAE) on Twitter. We propose a distantly supervised model to identify AAE-like language from demographics associated with geo-located messages,…
▽ More
Though dialectal language is increasingly abundant on social media, few resources exist for developing NLP tools to handle such language. We conduct a case study of dialectal language in online conversational text by investigating African-American English (AAE) on Twitter. We propose a distantly supervised model to identify AAE-like language from demographics associated with geo-located messages, and we verify that this language follows well-known AAE linguistic phenomena. In addition, we analyze the quality of existing language identification and dependency parsing tools on AAE-like text, demonstrating that they perform poorly on such text compared to text associated with white speakers. We also provide an ensemble classifier for language identification which eliminates this disparity and release a new corpus of tweets containing AAE-like language.
△ Less
Submitted 31 August, 2016;
originally announced August 2016.
-
Visualizing textual models with in-text and word-as-pixel highlighting
Authors:
Abram Handler,
Su Lin Blodgett,
Brendan O'Connor
Abstract:
We explore two techniques which use color to make sense of statistical text models. One method uses in-text annotations to illustrate a model's view of particular tokens in particular documents. Another uses a high-level, "words-as-pixels" graphic to display an entire corpus. Together, these methods offer both zoomed-in and zoomed-out perspectives into a model's understanding of text. We show how…
▽ More
We explore two techniques which use color to make sense of statistical text models. One method uses in-text annotations to illustrate a model's view of particular tokens in particular documents. Another uses a high-level, "words-as-pixels" graphic to display an entire corpus. Together, these methods offer both zoomed-in and zoomed-out perspectives into a model's understanding of text. We show how these interconnected methods help diagnose a classifier's poor performance on Twitter slang, and make sense of a topic model on historical political texts.
△ Less
Submitted 20 June, 2016;
originally announced June 2016.
-
Posterior calibration and exploratory analysis for natural language processing models
Authors:
Khanh Nguyen,
Brendan O'Connor
Abstract:
Many models in natural language processing define probabilistic distributions over linguistic structures. We argue that (1) the quality of a model' s posterior distribution can and should be directly evaluated, as to whether probabilities correspond to empirical frequencies, and (2) NLP uncertainty can be projected not only to pipeline components, but also to exploratory data analysis, telling a u…
▽ More
Many models in natural language processing define probabilistic distributions over linguistic structures. We argue that (1) the quality of a model' s posterior distribution can and should be directly evaluated, as to whether probabilities correspond to empirical frequencies, and (2) NLP uncertainty can be projected not only to pipeline components, but also to exploratory data analysis, telling a user when to trust and not trust the NLP analysis. We present a method to analyze calibration, and apply it to compare the miscalibration of several commonly used models. We also contribute a coreference sampling algorithm that can create confidence intervals for a political event extraction task.
△ Less
Submitted 2 September, 2015; v1 submitted 20 August, 2015;
originally announced August 2015.
-
ARKref: a rule-based coreference resolution system
Authors:
Brendan O'Connor,
Michael Heilman
Abstract:
ARKref is a tool for noun phrase coreference. It is a deterministic, rule-based system that uses syntactic information from a constituent parser, and semantic information from an entity recognition component. Its architecture is based on the work of Haghighi and Klein (2009). ARKref was originally written in 2009. At the time of writing, the last released version was in March 2011. This document d…
▽ More
ARKref is a tool for noun phrase coreference. It is a deterministic, rule-based system that uses syntactic information from a constituent parser, and semantic information from an entity recognition component. Its architecture is based on the work of Haghighi and Klein (2009). ARKref was originally written in 2009. At the time of writing, the last released version was in March 2011. This document describes that version, which is open-source and publicly available at: http://www.ark.cs.cmu.edu/ARKref
△ Less
Submitted 7 October, 2013;
originally announced October 2013.
-
Evidence that Cross-Domain Re-interpretations of Creative Ideas are Recognizable
Authors:
Apara Ranjan,
Liane Gabora,
Brian O'Connor
Abstract:
The goal of this study was to investigate the translate-ability of creative works into other domains. We tested whether people were able to recognize which works of art were inspired by which pieces of music. Three expert painters created four paintings, each of which was the artist's interpretation of one of four different pieces of instrumental music. Participants were able to identify which pai…
▽ More
The goal of this study was to investigate the translate-ability of creative works into other domains. We tested whether people were able to recognize which works of art were inspired by which pieces of music. Three expert painters created four paintings, each of which was the artist's interpretation of one of four different pieces of instrumental music. Participants were able to identify which paintings were inspired by which pieces of music at statistically significant above-chance levels. The findings support the hypothesis that creative ideas can exist in an at least somewhat domain-independent state of potentiality and become more well-defined as they are actualized in accordance with the constraints of a particular domain.
△ Less
Submitted 9 July, 2019; v1 submitted 1 October, 2013;
originally announced October 2013.
-
Learning Frames from Text with an Unsupervised Latent Variable Model
Authors:
Brendan O'Connor
Abstract:
We develop a probabilistic latent-variable model to discover semantic frames---types of events and their participants---from corpora. We present a Dirichlet-multinomial model in which frames are latent categories that explain the linking of verb-subject-object triples, given document-level sparsity. We analyze what the model learns, and compare it to FrameNet, noting it learns some novel and inter…
▽ More
We develop a probabilistic latent-variable model to discover semantic frames---types of events and their participants---from corpora. We present a Dirichlet-multinomial model in which frames are latent categories that explain the linking of verb-subject-object triples, given document-level sparsity. We analyze what the model learns, and compare it to FrameNet, noting it learns some novel and interesting frames. This document also contains a discussion of inference issues, including concentration parameter learning; and a small-scale error analysis of syntactic parsing accuracy.
△ Less
Submitted 28 July, 2013;
originally announced July 2013.
-
A framework for (under)specifying dependency syntax without overloading annotators
Authors:
Nathan Schneider,
Brendan O'Connor,
Naomi Saphra,
David Bamman,
Manaal Faruqui,
Noah A. Smith,
Chris Dyer,
Jason Baldridge
Abstract:
We introduce a framework for lightweight dependency syntax annotation. Our formalism builds upon the typical representation for unlabeled dependencies, permitting a simple notation and annotation workflow. Moreover, the formalism encourages annotators to underspecify parts of the syntax if doing so would streamline the annotation process. We demonstrate the efficacy of this annotation on three lan…
▽ More
We introduce a framework for lightweight dependency syntax annotation. Our formalism builds upon the typical representation for unlabeled dependencies, permitting a simple notation and annotation workflow. Moreover, the formalism encourages annotators to underspecify parts of the syntax if doing so would streamline the annotation process. We demonstrate the efficacy of this annotation on three languages and develop algorithms to evaluate and compare underspecified annotations.
△ Less
Submitted 14 June, 2013; v1 submitted 9 June, 2013;
originally announced June 2013.
-
An Online Environment for Democratic Deliberation: Motivations, Principles, and Design
Authors:
Todd Davies,
Brendan O'Connor,
Alex Cochran,
Jonathan J. Effrat,
Andrew Parker,
Benjamin Newman,
Aaron Tam
Abstract:
We have created a platform for online deliberation called Deme (which rhymes with 'team'). Deme is designed to allow groups of people to engage in collaborative drafting, focused discussion, and decision making using the Internet. The Deme project has evolved greatly from its beginning in 2003. This chapter outlines the thinking behind Deme's initial design: our motivations for creating it, the pr…
▽ More
We have created a platform for online deliberation called Deme (which rhymes with 'team'). Deme is designed to allow groups of people to engage in collaborative drafting, focused discussion, and decision making using the Internet. The Deme project has evolved greatly from its beginning in 2003. This chapter outlines the thinking behind Deme's initial design: our motivations for creating it, the principles that guided its construction, and its most important design features. The version of Deme described here was written in PHP and was deployed in 2004 and used by several groups (including organizers of the 2005 Online Deliberation Conference). Other papers describe later developments in the Deme project (see Davies et al. 2005, 2008; Davies and Mintz 2009).
△ Less
Submitted 15 February, 2013;
originally announced February 2013.
-
Displaying Asynchronous Reactions to a Document: Two Goals and a Design
Authors:
Todd Davies,
Benjamin Newman,
Brendan O'Connor,
Aaron Tam,
Leo Perry
Abstract:
We describe and motivate three goals for the screen display of asynchronous text deliberation pertaining to a document: (1) visibility of relationships between comments and the text they reference, between different comments, and between group members and the document and discussion, and (2) distinguishability of boundaries between contextually related and unrelated text and comments and between i…
▽ More
We describe and motivate three goals for the screen display of asynchronous text deliberation pertaining to a document: (1) visibility of relationships between comments and the text they reference, between different comments, and between group members and the document and discussion, and (2) distinguishability of boundaries between contextually related and unrelated text and comments and between individual authors of documents and comments. Interfaces for document-centered discussion generally fail to fulfill one or both of these goals as well as they could. We describe the design of the new version of Deme, a Web-based platform for online deliberation, and argue that it achieves the two goals better than other recent designs.
△ Less
Submitted 14 February, 2013;
originally announced February 2013.
-
"Groupware for Groups": Problem-Driven Design in Deme
Authors:
Todd Davies,
Brendan O'Connor,
Alex Cochran,
Andrew Parker
Abstract:
Design choices can be clarified when group interaction software is directed at solving the interaction needs of particular groups that pre-date the groupware. We describe an example: the Deme platform for online deliberation. Traditional threaded conversation systems are insufficient for solving the problem at which Deme is aimed, namely, that the democratic process in grassroots community groups…
▽ More
Design choices can be clarified when group interaction software is directed at solving the interaction needs of particular groups that pre-date the groupware. We describe an example: the Deme platform for online deliberation. Traditional threaded conversation systems are insufficient for solving the problem at which Deme is aimed, namely, that the democratic process in grassroots community groups is undermined both by the limited availability of group members for face-to-face meetings and by constraints on the use of information in real-time interactions. We describe and motivate design elements, either implemented or planned for Deme, that addresses this problem. We believe that "problem focused" design of software for preexisting groups provides a useful framework for evaluating the appropriateness of design elements in groupware generally.
△ Less
Submitted 13 February, 2013;
originally announced February 2013.
-
Diffusion of Lexical Change in Social Media
Authors:
Jacob Eisenstein,
Brendan O'Connor,
Noah A. Smith,
Eric P. Xing
Abstract:
Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level patterns in diffusion of linguistic change…
▽ More
Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level patterns in diffusion of linguistic change over the United States. Our model is robust to unpredictable changes in Twitter's sampling rate, and provides a probabilistic characterization of the relationship of macro-scale linguistic influence to a set of demographic and geographic predictors. The results of this analysis offer support for prior arguments that focus on geographical proximity and population size. However, demographic similarity -- especially with regard to race -- plays an even more central role, as cities with similar racial demographics are far more likely to share linguistic influence. Rather than moving towards a single unified "netspeak" dialect, language evolution in computer-mediated communication reproduces existing fault lines in spoken American English.
△ Less
Submitted 23 November, 2014; v1 submitted 18 October, 2012;
originally announced October 2012.