Skip to main content

Showing 1–40 of 40 results for author: O'Connor, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10798  [pdf, other

    cs.CL

    Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets

    Authors: Erica Cai, Sean McQuade, Kevin Young, Brendan O'Connor

    Abstract: When knowledge graphs (KGs) are automatically extracted from text, are they accurate enough for downstream analysis? Unfortunately, current annotated datasets can not be used to evaluate this question, since their KGs are highly disconnected, too small, or overly complex. To address this gap, we introduce AffilKG (https://doi.org/10.5281/zenodo.15427977), which is a collection of six datasets that… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2503.02707  [pdf, other

    cs.CL cs.CY

    Multilingualism, Transnationality, and K-pop in the Online #StopAsianHate Movement

    Authors: Tessa Masis, Zhangqi Duan, Weiai Wayne Xu, Ethan Zuckerman, Jane Yeahin Pyo, Brendan O'Connor

    Abstract: The #StopAsianHate (SAH) movement is a broad social movement against violence targeting Asians and Asian Americans, beginning in 2021 in response to racial discrimination related to COVID-19 and sparking worldwide conversation about anti-Asian hate. However, research on the online SAH movement has focused on English-speaking participants so the spread of the movement outside of the United States i… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: WebSci'25

  3. arXiv:2502.08415  [pdf, other

    cs.CL cs.LO

    A Semantic Parsing Algorithm to Solve Linear Ordering Problems

    Authors: Maha Alkhairy, Vincent Homer, Brendan O'Connor

    Abstract: We develop an algorithm to semantically parse linear ordering problems, which require a model to arrange entities using deductive reasoning. Our method takes as input a number of premises and candidate statements, parsing them to a first-order logic of an ordering domain, and then utilizes constraint logic programming to infer the truth of proposed statements about the ordering. Our semantic par… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

    Comments: 3 figures, 9 pages main paper and 6 pages references and appendix

  4. arXiv:2408.06675  [pdf, other

    cs.CL

    Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time

    Authors: Marisa Hudspeth, Brendan O'Connor, Laure Thompson

    Abstract: Existing Latin treebanks draw from Latin's long written tradition, spanning 17 centuries and a variety of cultures. Recent efforts have begun to harmonize these treebanks' annotations to better train and evaluate morphological taggers. However, the heterogeneity of these treebanks must be carefully considered to build effective and reliable data. In this work, we review existing Latin treebanks to… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  5. arXiv:2404.18784  [pdf, other

    cs.CL cs.AI

    Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input

    Authors: Tessa Masis, Brendan O'Connor

    Abstract: Geo-entity linking is the task of linking a location mention to the real-world geographic location. In this paper we explore the challenging task of geo-entity linking for noisy, multilingual social media data. There are few open-source multilingual geo-entity linking tools available and existing ones are often rule-based, which break easily in social media settings, or LLM-based, which are too ex… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: NLP+CSS workshop at NAACL 2024

  6. arXiv:2305.15051  [pdf, other

    cs.CL

    A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction

    Authors: Erica Cai, Brendan O'Connor

    Abstract: Current social science efforts automatically populate event databases of "who did what to whom?" tuples, by applying event extraction (EE) to text such as news. The event databases are used to analyze sociopolitical dynamics between actor pairs (dyads) in, e.g., international relations. While most EE methods heavily rely on rules or supervised learning, \emph{zero-shot} event extraction could pote… ▽ More

    Submitted 2 June, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted at NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following; oral presentation at New England Natural Language Processing, 2023; 17 pages of text including references and appendix

  7. arXiv:2302.13678  [pdf, other

    cs.SD cs.AI eess.AS

    A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion

    Authors: Brendan O'Connor, Simon Dixon

    Abstract: Previous research has shown that established techniques for spoken voice conversion (VC) do not perform as well when applied to singing voice conversion (SVC). We propose an alternative loss component in a loss function that is otherwise well-established among VC tasks, which has been shown to improve our model's SVC performance. We first trained a singer identity embedding (SIE) network on mel-sp… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: Submitted to the Sound and Music Computing Conference 2023

  8. arXiv:2212.14486  [pdf, other

    cs.CL

    Examining Political Rhetoric with Epistemic Stance Detection

    Authors: Ankita Gupta, Su Lin Blodgett, Justin H Gross, Brendan O'Connor

    Abstract: Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance predict… ▽ More

    Submitted 5 January, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: Forthcoming in Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS) at EMNLP 2022

  9. arXiv:2210.07188  [pdf, other

    cs.CL

    ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

    Authors: Ankita Gupta, Marzena Karpinska, Wenlong Zhao, Kalpesh Krishna, Jack Merullo, Luke Yeh, Mohit Iyyer, Brendan O'Connor

    Abstract: Large-scale, high-quality corpora are critical for advancing research in coreference resolution. However, existing datasets vary in their definition of coreferences and have been collected via complex and lengthy guidelines that are curated for linguistic experts. These concerns have sparked a growing interest among researchers to curate a unified set of guidelines suitable for annotators with var… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: preprint (19 pages), code in https://github.com/gnkitaa/ezCoref

  10. arXiv:2210.06986  [pdf, ps, other

    cs.CL

    Tone prediction and orthographic conversion for Basaa

    Authors: Ilya Nikitin, Brian O'Connor, Anastasia Safonova

    Abstract: In this paper, we present a seq2seq approach for transliterating missionary Basaa orthographies into the official orthography. Our model uses pre-trained Basaa missionary and official orthography corpora using BERT. Since Basaa is a low-resource language, we have decided to use the mT5 model for our project. Before training our model, we pre-processed our corpora by eliminating one-to-one correspo… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

  11. arXiv:2209.07611  [pdf, other

    cs.CL cs.AI

    Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties

    Authors: Tessa Masis, Anissa Neal, Lisa Green, Brendan O'Connor

    Abstract: The study of language variation examines how language varies between and within different groups of speakers, shedding light on how we use language to construct identities and how social contexts affect language use. A common method is to identify instances of a certain linguistic feature - say, the zero copula construction - in a corpus, and analyze the feature's distribution across speakers, top… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: Field Matters Workshop at COLING 2022

  12. arXiv:2209.03116  [pdf, ps, other

    cs.CV

    A New Method for the High-Precision Assessment of Tumor Changes in Response to Treatment

    Authors: P. D. Tar, N. A. Thacker, J. P. B. O'Connor

    Abstract: Imaging demonstrates that preclinical and human tumors are heterogeneous, i.e. a single tumor can exhibit multiple regions that behave differently during both normal development and also in response to treatment. The large variations observed in control group tumors can obscure detection of significant therapeutic effects due to the ambiguity in attributing causes of change. This can hinder develo… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

  13. arXiv:2204.04694  [pdf, other

    cs.HC cs.DL

    ClioQuery: Interactive Query-Oriented Text Analytics for Comprehensive Investigation of Historical News Archives

    Authors: Abram Handler, Narges Mahyar, Brendan O'Connor

    Abstract: Historians and archivists often find and analyze the occurrences of query words in newspaper archives, to help answer fundamental questions about society. But much work in text analytics focuses on helping people investigate other textual units, such as events, clusters, ranked documents, entity relationships, or thematic hierarchies. Informed by a study into the needs of historians and archivists… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

    Comments: Forthcoming in ACM Transactions on Interactive Intelligent Systems (TiiS)

  14. arXiv:2203.05097  [pdf

    cs.DC

    A Framework for the Interoperability of Cloud Platforms: Towards FAIR Data in SAFE Environments

    Authors: Robert L. Grossman, Rebecca R. Boyles, Brandi N. Davis-Dusenbery, Amanda Haddock, Allison P. Heath, Brian D. O'Connor, Adam C. Resnick, Deanne M. Taylor, Stan Ahalt

    Abstract: As the number of cloud platforms supporting scientific research grows, there is an increasing need to support interoperability between two or more cloud platforms, as a growing amount of data is being hosted in cloud-based platforms. A well accepted core concept is to make data in cloud platforms Findable, Accessible, Interoperable and Reusable (FAIR). We introduce a companion concept that applies… ▽ More

    Submitted 15 February, 2024; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: 16 pages with 2 figures

    ACM Class: D.2.11; D.2.12; E.0

  15. arXiv:2111.08839  [pdf, other

    cs.SD eess.AS

    Zero-shot Singing Technique Conversion

    Authors: Brendan O'Connor, Simon Dixon, George Fazekas

    Abstract: In this paper we propose modifications to the neural network framework, AutoVC for the task of singing technique conversion. This includes utilising a pretrained singing technique encoder which extracts technique information, upon which a decoder is conditioned during training. By swapping out a source singer's technique information for that of the target's during conversion, the input spectrogram… ▽ More

    Submitted 16 November, 2021; originally announced November 2021.

    Comments: In Proceedings of the 15th International Symposium on Computer Music Multidisciplinary Research (CMMR 2021), Tokyo, Japan, November 15-16, 2021

  16. An Exploratory Study on Perceptual Spaces of the Singing Voice

    Authors: Brendan O'Connor, Simon Dixon, George Fazekas

    Abstract: Sixty participants provided dissimilarity ratings between various singing techniques. Multidimensional scaling, class averaging and clustering techniques were used to analyse timbral spaces and how they change between different singers, genders and registers. Clustering analysis showed that ground-truth similarity and silhouette scores that were not significantly different between gender or regist… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: In Proceedings of the 2020 Joint Conference on AI Music Creativity (CSMC-MuMe 2020), Stockholm, Sweden, October 15-19, 2020

  17. arXiv:2109.07542  [pdf, other

    cs.CL

    Text as Causal Mediators: Research Design for Causal Estimates of Differential Treatment of Social Groups via Language Aspects

    Authors: Katherine A. Keith, Douglas Rice, Brendan O'Connor

    Abstract: Using observed language to understand interpersonal interactions is important in high-stakes decision making. We propose a causal research design for observational (non-experimental) data to estimate the natural direct and indirect effects of social group signals (e.g. race or gender) on speakers' responses with separate aspects of language as causal mediators. We illustrate the promises and chall… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

    Comments: Accepted to Causal Inference and NLP (CI+NLP) Workshop at EMNLP 2021

    Journal ref: Causal Inference and NLP (CI+NLP) Workshop at EMNLP 2021

  18. arXiv:2105.12936  [pdf, other

    cs.CL

    Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence

    Authors: Andrew Halterman, Katherine A. Keith, Sheikh Muhammad Sarwar, Brendan O'Connor

    Abstract: Automated event extraction in social science applications often requires corpus-level evaluations: for example, aggregating text predictions across metadata and unbiased estimates of recall. We combine corpus-level evaluation requirements with a real-world, social science setting and introduce the IndiaPoliceEvents corpus--all 21,391 sentences from 1,257 English-language Times of India articles ab… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

    Comments: To appear in Findings of ACL 2021

    Journal ref: Findings of ACL 2021

  19. arXiv:2011.00092  [pdf, other

    cs.CL

    Analyzing Gender Bias within Narrative Tropes

    Authors: Dhruvil Gala, Mohammad Omar Khursheed, Hannah Lerner, Brendan O'Connor, Mohit Iyyer

    Abstract: Popular media reflects and reinforces societal biases through the use of tropes, which are narrative elements, such as archetypal characters and plot arcs, that occur frequently across media. In this paper, we specifically investigate gender bias within a large collection of tropes. To enable our study, we crawl tvtropes.org, an online user-created repository that contains 30K tropes associated wi… ▽ More

    Submitted 30 October, 2020; originally announced November 2020.

  20. arXiv:2010.04706  [pdf, other

    cs.CL

    Uncertainty over Uncertainty: Investigating the Assumptions, Annotations, and Text Measurements of Economic Policy Uncertainty

    Authors: Katherine A. Keith, Christoph Teichmann, Brendan O'Connor, Edgar Meij

    Abstract: Methods and applications are inextricably linked in science, and in particular in the domain of text-as-data. In this paper, we examine one such text-as-data application, an established economic index that measures economic policy uncertainty from keyword occurrences in news. This index, which is shown to correlate with firm investment, employment, and excess market returns, has had substantive im… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: Accepted to the 2020 Natural Language Processing + Computational Social Science Workshop (NLP+CSS) at EMNLP

    Journal ref: 2020 Natural Language Processing + Computational Social Science Workshop (NLP+CSS) at EMNLP

  21. arXiv:2005.00649  [pdf, other

    cs.CL

    Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates

    Authors: Katherine A. Keith, David Jensen, Brendan O'Connor

    Abstract: Many applications of computational social science aim to infer causal conclusions from non-experimental data. Such observational data often contains confounders, variables that influence both potential causes and potential effects. Unmeasured or latent confounders can bias causal estimates, and this has motivated interest in measuring potential confounders from observed text. For example, an indiv… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: Accepted to ACL 2020

    Journal ref: ACL 2020

  22. arXiv:1909.03343  [pdf, other

    cs.CL

    Investigating Sports Commentator Bias within a Large Corpus of American Football Broadcasts

    Authors: Jack Merullo, Luke Yeh, Abram Handler, Alvin Grissom II, Brendan O'Connor, Mohit Iyyer

    Abstract: Sports broadcasters inject drama into play-by-play commentary by building team and player narratives through subjective analyses and anecdotes. Prior studies based on small datasets and manual coding show that such theatrics evince commentator bias in sports broadcasts. To examine this phenomenon, we assemble FOOTBALL, which contains 1,455 broadcast transcripts from American football games across… ▽ More

    Submitted 18 October, 2019; v1 submitted 7 September, 2019; originally announced September 2019.

  23. arXiv:1904.09051  [pdf, other

    cs.CL

    Query-focused Sentence Compression in Linear Time

    Authors: Abram Handler, Brendan O'Connor

    Abstract: Search applications often display shortened sentences which must contain certain query terms and must fit within the space constraints of a user interface. This work introduces a new transition-based sentence compression technique developed for such settings. Our query-focused method constructs length and lexically constrained compressions in linear time, by growing a subgraph in the dependency pa… ▽ More

    Submitted 17 September, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

    Comments: EMNLP 2019 (short paper)

  24. arXiv:1902.00489  [pdf, other

    cs.CL

    Human acceptability judgements for extractive sentence compression

    Authors: Abram Handler, Brian Dillon, Brendan O'Connor

    Abstract: Recent approaches to English-language sentence compression rely on parallel corpora consisting of sentence-compression pairs. However, a sentence may be shortened in many different ways, which each might be suited to the needs of a particular application. Therefore, in this work, we collect and model crowdsourced judgements of the acceptability of many possible sentence shortenings. We then show h… ▽ More

    Submitted 1 February, 2019; originally announced February 2019.

  25. arXiv:1809.02035  [pdf, other

    cs.CL

    Evaluating Syntactic Properties of Seq2seq Output with a Broad Coverage HPSG: A Case Study on Machine Translation

    Authors: Johnny Tian-Zheng Wei, Khiem Pham, Brian Dillon, Brendan O'Connor

    Abstract: Sequence to sequence (seq2seq) models are often employed in settings where the target output is natural language. However, the syntactic properties of the language generated from these models are not well understood. We explore whether such output belongs to a formal and realistic grammar, by employing the English Resource Grammar (ERG), a broad coverage, linguistically precise HPSG-based grammar… ▽ More

    Submitted 6 September, 2018; originally announced September 2018.

  26. arXiv:1804.06004  [pdf, other

    cs.CL

    Monte Carlo Syntax Marginals for Exploring and Using Dependency Parses

    Authors: Katherine A. Keith, Su Lin Blodgett, Brendan O'Connor

    Abstract: Dependency parsing research, which has made significant gains in recent years, typically focuses on improving the accuracy of single-tree predictions. However, ambiguity is inherent to natural language syntax, and communicating such ambiguity is important for error analysis and better-informed downstream applications. In this work, we propose a transition sampling algorithm to sample from the full… ▽ More

    Submitted 16 April, 2018; originally announced April 2018.

    Comments: To appear in Proceedings of NAACL 2018

  27. arXiv:1708.01944  [pdf, other

    cs.HC cs.CL

    Rookie: A unique approach for exploring news archives

    Authors: Abram Handler, Brendan O'Connor

    Abstract: News archives are an invaluable primary source for placing current events in historical context. But current search engine tools do a poor job at uncovering broad themes and narratives across documents. We present Rookie: a practical software system which uses natural language processing (NLP) to help readers, reporters and editors uncover broad stories in news archives. Unlike prior work, Rookie'… ▽ More

    Submitted 6 August, 2017; originally announced August 2017.

    Comments: Presented at KDD 2017: Data Science + Journalism workshop

  28. arXiv:1707.07086  [pdf, other

    cs.CL

    Identifying civilians killed by police with distantly supervised entity-event extraction

    Authors: Katherine A. Keith, Abram Handler, Michael Pinkham, Cara Magliozzi, Joshua McDuffie, Brendan O'Connor

    Abstract: We propose a new, socially-impactful task for natural language processing: from a news corpus, extract names of persons who have been killed by police. We present a newly collected police fatality corpus, which we release publicly, and present a model to solve this problem that uses EM-based distant supervision with logistic regression and convolutional neural network classifiers. Our model outper… ▽ More

    Submitted 21 July, 2017; originally announced July 2017.

    ACM Class: I.2.7

    Journal ref: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

  29. arXiv:1707.00061  [pdf, other

    cs.CY cs.CL

    Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English

    Authors: Su Lin Blodgett, Brendan O'Connor

    Abstract: We highlight an important frontier in algorithmic fairness: disparity in the quality of natural language processing algorithms when applied to language from authors of different social groups. For example, current systems sometimes analyze the language of females and minorities more poorly than they do of whites and males. We conduct an empirical analysis of racial disparity in language identifica… ▽ More

    Submitted 30 June, 2017; originally announced July 2017.

    Comments: Presented as a talk at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017)

  30. arXiv:1608.08868  [pdf, other

    cs.CL

    Demographic Dialectal Variation in Social Media: A Case Study of African-American English

    Authors: Su Lin Blodgett, Lisa Green, Brendan O'Connor

    Abstract: Though dialectal language is increasingly abundant on social media, few resources exist for developing NLP tools to handle such language. We conduct a case study of dialectal language in online conversational text by investigating African-American English (AAE) on Twitter. We propose a distantly supervised model to identify AAE-like language from demographics associated with geo-located messages,… ▽ More

    Submitted 31 August, 2016; originally announced August 2016.

    Comments: To be published in EMNLP 2016, 15 pages

  31. arXiv:1606.06352  [pdf, other

    stat.ML cs.CL cs.LG

    Visualizing textual models with in-text and word-as-pixel highlighting

    Authors: Abram Handler, Su Lin Blodgett, Brendan O'Connor

    Abstract: We explore two techniques which use color to make sense of statistical text models. One method uses in-text annotations to illustrate a model's view of particular tokens in particular documents. Another uses a high-level, "words-as-pixels" graphic to display an entire corpus. Together, these methods offer both zoomed-in and zoomed-out perspectives into a model's understanding of text. We show how… ▽ More

    Submitted 20 June, 2016; originally announced June 2016.

    Comments: Presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

  32. arXiv:1508.05154  [pdf, other

    cs.CL

    Posterior calibration and exploratory analysis for natural language processing models

    Authors: Khanh Nguyen, Brendan O'Connor

    Abstract: Many models in natural language processing define probabilistic distributions over linguistic structures. We argue that (1) the quality of a model' s posterior distribution can and should be directly evaluated, as to whether probabilities correspond to empirical frequencies, and (2) NLP uncertainty can be projected not only to pipeline components, but also to exploratory data analysis, telling a u… ▽ More

    Submitted 2 September, 2015; v1 submitted 20 August, 2015; originally announced August 2015.

    Comments: 15 pages (including supplementary information), proceedings of EMNLP 2015

  33. arXiv:1310.1975  [pdf, ps, other

    cs.CL

    ARKref: a rule-based coreference resolution system

    Authors: Brendan O'Connor, Michael Heilman

    Abstract: ARKref is a tool for noun phrase coreference. It is a deterministic, rule-based system that uses syntactic information from a constituent parser, and semantic information from an entity recognition component. Its architecture is based on the work of Haghighi and Klein (2009). ARKref was originally written in 2009. At the time of writing, the last released version was in March 2011. This document d… ▽ More

    Submitted 7 October, 2013; originally announced October 2013.

  34. arXiv:1310.0519  [pdf

    q-bio.NC cs.HC

    Evidence that Cross-Domain Re-interpretations of Creative Ideas are Recognizable

    Authors: Apara Ranjan, Liane Gabora, Brian O'Connor

    Abstract: The goal of this study was to investigate the translate-ability of creative works into other domains. We tested whether people were able to recognize which works of art were inspired by which pieces of music. Three expert painters created four paintings, each of which was the artist's interpretation of one of four different pieces of instrumental music. Participants were able to identify which pai… ▽ More

    Submitted 9 July, 2019; v1 submitted 1 October, 2013; originally announced October 2013.

    Comments: 6 pages. arXiv admin note: substantial text overlap with arXiv:1308.4706

    Journal ref: In G. Stojanov & B. Indurkhya (Co-Chairs), Creativity and (early) cognitive development. Symposium conducted at the meeting of Association for the Advancement of Artificial Intelligence (AAAI), Palo Alto, CA. (2013)

  35. arXiv:1307.7382  [pdf, other

    cs.CL

    Learning Frames from Text with an Unsupervised Latent Variable Model

    Authors: Brendan O'Connor

    Abstract: We develop a probabilistic latent-variable model to discover semantic frames---types of events and their participants---from corpora. We present a Dirichlet-multinomial model in which frames are latent categories that explain the linking of verb-subject-object triples, given document-level sparsity. We analyze what the model learns, and compare it to FrameNet, noting it learns some novel and inter… ▽ More

    Submitted 28 July, 2013; originally announced July 2013.

    Comments: 21 pages; technical report for Data Analysis Project requirement, Machine Learning Department, Carnegie Mellon University

  36. arXiv:1306.2091  [pdf, other

    cs.CL

    A framework for (under)specifying dependency syntax without overloading annotators

    Authors: Nathan Schneider, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Noah A. Smith, Chris Dyer, Jason Baldridge

    Abstract: We introduce a framework for lightweight dependency syntax annotation. Our formalism builds upon the typical representation for unlabeled dependencies, permitting a simple notation and annotation workflow. Moreover, the formalism encourages annotators to underspecify parts of the syntax if doing so would streamline the annotation process. We demonstrate the efficacy of this annotation on three lan… ▽ More

    Submitted 14 June, 2013; v1 submitted 9 June, 2013; originally announced June 2013.

    Comments: This is an expanded version of a paper appearing in Proceedings of the 7th Linguistic Annotation Workshop & Interoperability with Discourse, Sofia, Bulgaria, August 8-9, 2013

  37. arXiv:1302.3912  [pdf

    cs.HC cs.CY cs.SI

    An Online Environment for Democratic Deliberation: Motivations, Principles, and Design

    Authors: Todd Davies, Brendan O'Connor, Alex Cochran, Jonathan J. Effrat, Andrew Parker, Benjamin Newman, Aaron Tam

    Abstract: We have created a platform for online deliberation called Deme (which rhymes with 'team'). Deme is designed to allow groups of people to engage in collaborative drafting, focused discussion, and decision making using the Internet. The Deme project has evolved greatly from its beginning in 2003. This chapter outlines the thinking behind Deme's initial design: our motivations for creating it, the pr… ▽ More

    Submitted 15 February, 2013; originally announced February 2013.

    Comments: Appeared in Todd Davies and Seeta Peña Gangadharan (Editors), Online Deliberation: Design, Research, and Practice, CSLI Publications/University of Chicago Press, October 2009, pp. 275-292; 18 pages, 3 figures

    ACM Class: H.5.3; K.4.1; K.4.3

  38. arXiv:1302.3545  [pdf

    cs.HC

    Displaying Asynchronous Reactions to a Document: Two Goals and a Design

    Authors: Todd Davies, Benjamin Newman, Brendan O'Connor, Aaron Tam, Leo Perry

    Abstract: We describe and motivate three goals for the screen display of asynchronous text deliberation pertaining to a document: (1) visibility of relationships between comments and the text they reference, between different comments, and between group members and the document and discussion, and (2) distinguishability of boundaries between contextually related and unrelated text and comments and between i… ▽ More

    Submitted 14 February, 2013; originally announced February 2013.

    Comments: Appeared as a Poster Paper, Conference on Computer Supported Cooperative Work, 20th Anniversary - Conference Supplement (CSCW 2006, Banff, November 4-8, 2006), pp. 169-170; Modified as "Document Centered Discussion: A Design Pattern for Online Deliberation", in D. Schuler, Liberating Voices: A Pattern Language for Communication Revolution, MIT Press, 2008, pp. 384-386; 2 pages, 1 figure, 1 table

    ACM Class: H.5.3; I.7.1

  39. arXiv:1302.3209  [pdf

    cs.HC cs.SI

    "Groupware for Groups": Problem-Driven Design in Deme

    Authors: Todd Davies, Brendan O'Connor, Alex Cochran, Andrew Parker

    Abstract: Design choices can be clarified when group interaction software is directed at solving the interaction needs of particular groups that pre-date the groupware. We describe an example: the Deme platform for online deliberation. Traditional threaded conversation systems are insufficient for solving the problem at which Deme is aimed, namely, that the democratic process in grassroots community groups… ▽ More

    Submitted 13 February, 2013; originally announced February 2013.

    Comments: Position paper from the Beyond Threaded Conversation Workshop at CHI 2005, Portland, Oregon, April 3, 2005; 3 pages, 2 figures

    ACM Class: H.4.1; H.5.3; I.5.2; H.5.2

  40. arXiv:1210.5268  [pdf, other

    cs.CL cs.SI physics.soc-ph

    Diffusion of Lexical Change in Social Media

    Authors: Jacob Eisenstein, Brendan O'Connor, Noah A. Smith, Eric P. Xing

    Abstract: Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level patterns in diffusion of linguistic change… ▽ More

    Submitted 23 November, 2014; v1 submitted 18 October, 2012; originally announced October 2012.

    Comments: preprint of PLOS-ONE paper from November 2014; PLoS ONE 9(11) e113114