Skip to main content

Showing 1–13 of 13 results for author: Beeferman, D

Searching in archive cs. Search in all archives.
.
  1. AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism

    Authors: William Brannon, Doug Beeferman, Hang Jiang, Andrew Heyward, Deb Roy

    Abstract: Understanding and making use of audience feedback is important but difficult for journalists, who now face an impractically large volume of audience comments online. We introduce AudienceView, an online tool to help journalists categorize and interpret this feedback by leveraging large language models (LLMs). AudienceView identifies themes and topics, connects them back to specific comments, provi… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted at CSCW Demo 2024. 5 pages, 2 figures

    Journal ref: Proc. CSCW (2024) 65-68

  2. Bridging Dictionary: AI-Generated Dictionary of Partisan Language Use

    Authors: Hang Jiang, Doug Beeferman, William Brannon, Andrew Heyward, Deb Roy

    Abstract: Words often carry different meanings for people from diverse backgrounds. Today's era of social polarization demands that we choose words carefully to prevent miscommunication, especially in political communication and journalism. To address this issue, we introduce the Bridging Dictionary, an interactive tool designed to illuminate how words are perceived by people with different political views.… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted to CSCW Demo 2024

    Journal ref: Proc. CSCW (2024) 79-82

  3. arXiv:2306.15112  [pdf, other

    cs.CL

    FeedbackMap: a tool for making sense of open-ended survey responses

    Authors: Doug Beeferman, Nabeel Gillani

    Abstract: Analyzing open-ended survey responses is a crucial yet challenging task for social scientists, non-profit organizations, and educational institutions, as they often face the trade-off between obtaining rich data and the burden of reading and coding textual responses. This demo introduces FeedbackMap, a web-based tool that uses natural language processing techniques to facilitate the analysis of op… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: Demo at CSCW 2023

  4. arXiv:2304.08967  [pdf, other

    cs.CY stat.AP

    All a-board: sharing educational data science research with school districts

    Authors: Nabeel Gillani, Doug Beeferman, Cassandra Overney, Christine Vega-Pourheydarian, Deb Roy

    Abstract: Educational data scientists often conduct research with the hopes of translating findings into lasting change through policy, civil society, or other channels. However, the bridge from research to practice can be fraught with sociopolitical frictions that impede, or altogether block, such translations -- especially when they are contentious or otherwise difficult to achieve. Focusing on one entren… ▽ More

    Submitted 5 July, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: In Proceedings of the Tenth ACM Conference on Learning at Scale (L@S '23)

  5. arXiv:2303.07603  [pdf, other

    cs.CY cs.AI

    Redrawing attendance boundaries to promote racial and ethnic diversity in elementary schools

    Authors: Nabeel Gillani, Doug Beeferman, Christine Vega-Pourheydarian, Cassandra Overney, Pascal Van Hentenryck, Deb Roy

    Abstract: Most US school districts draw "attendance boundaries" to define catchment areas that assign students to schools near their homes, often recapitulating neighborhood demographic segregation in schools. Focusing on elementary schools, we ask: how much might we reduce school segregation by redrawing attendance boundaries? Combining parent preference data with methods from combinatorial optimization, w… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Supplementary materials: https://drive.google.com/file/d/1OCV9fnv3m7jNMlwfPA8Mfi0s7BD5qtJN/view

  6. arXiv:2209.07065  [pdf, other

    cs.SI cs.AI cs.CL

    CommunityLM: Probing Partisan Worldviews from Language Models

    Authors: Hang Jiang, Doug Beeferman, Brandon Roy, Deb Roy

    Abstract: As political attitudes have diverged ideologically in the United States, political speech has diverged lingusitically. The ever-widening polarization between the US political parties is accelerated by an erosion of mutual understanding between them. We aim to make these communities more comprehensible to each other with a framework that probes community-specific responses to the same survey questi… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: Paper accepted by COLING 2022

  7. arXiv:2201.07281  [pdf, other

    cs.CL

    Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis

    Authors: Hang Jiang, Yining Hua, Doug Beeferman, Deb Roy

    Abstract: Social media data such as Twitter messages ("tweets") pose a particular challenge to NLP systems because of their short, noisy, and colloquial nature. Tasks such as Named Entity Recognition (NER) and syntactic parsing require highly domain-matched training data for good performance. To date, there is no complete training corpus for both NER and syntactic analysis (e.g., part of speech tagging, dep… ▽ More

    Submitted 10 May, 2022; v1 submitted 18 January, 2022; originally announced January 2022.

    Comments: Accepted at LREC 2022 (Long Papers)

  8. arXiv:2112.06166  [pdf, other

    cs.CL

    Topic Detection and Tracking with Time-Aware Document Embeddings

    Authors: Hang Jiang, Doug Beeferman, Weiquan Mao, Deb Roy

    Abstract: The time at which a message is communicated is a vital piece of metadata in many real-world natural language processing tasks such as Topic Detection and Tracking (TDT). TDT systems aim to cluster a corpus of news articles by event, and in that context, stories that describe the same event are likely to have been written at around the same time. Prior work on time modeling for TDT takes this into… ▽ More

    Submitted 26 March, 2024; v1 submitted 12 December, 2021; originally announced December 2021.

    Comments: Accepted to LREC-COLING 2024

  9. arXiv:2111.02646  [pdf, other

    cs.SI cs.CY

    Engaging Politically Diverse Audiences on Social Media

    Authors: Martin Saveski, Doug Beeferman, David McClure, Deb Roy

    Abstract: We study how political polarization is reflected in the social media posts used by media outlets to promote their content online. In particular, we track the Twitter posts of several media outlets over the course of more than three years (566K tweets), and the engagement with these tweets from other users (104M retweets), modeling the relationship between the tweet text and the political diversity… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: To appear in ICWSM'22 (International AAAI Conference on Web and Social Media)

  10. arXiv:2110.07337  [pdf, other

    cs.IR cs.AI cs.HC

    Topic-time Heatmaps for Human-in-the-loop Topic Detection and Tracking

    Authors: Doug Beeferman, Hang Jiang

    Abstract: The essential task of Topic Detection and Tracking (TDT) is to organize a collection of news media into clusters of stories that pertain to the same real-world event. To apply TDT models to practical applications such as search engines and discovery tools, human guidance is needed to pin down the scope of an "event" for the corpus of interest. In this work in progress, we explore a human-in-the-lo… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: Accepted to DaSH Workshop, KDD 2021

  11. RadioTalk: a large-scale corpus of talk radio transcripts

    Authors: Doug Beeferman, William Brannon, Deb Roy

    Abstract: We introduce RadioTalk, a corpus of speech recognition transcripts sampled from talk radio broadcasts in the United States between October of 2018 and March of 2019. The corpus is intended for use by researchers in the fields of natural language processing, conversational analysis, and the social sciences. The corpus encompasses approximately 2.8 billion words of automatically transcribed speech f… ▽ More

    Submitted 16 July, 2019; originally announced July 2019.

    Comments: 5 pages, 4 figures, accepted by Interspeech 2019

    Journal ref: Proc. Interspeech 2019, 564-568 (2019)

  12. A Model of Lexical Attraction and Repulsion

    Authors: Doug Beeferman, Adam Berger, John Lafferty

    Abstract: This paper introduces new methods based on exponential families for modeling the correlations between words in text and speech. While previous work assumed the effects of word co-occurrence statistics to be constant over a window of several hundred words, we show that their influence is nonstationary on a much smaller time scale. Empirical data drawn from English and Japanese text, as well as co… ▽ More

    Submitted 16 June, 1997; v1 submitted 12 June, 1997; originally announced June 1997.

    Comments: 8 pages, LaTeX source and postscript figures for ACL/EACL'97 paper

  13. Text Segmentation Using Exponential Models

    Authors: Doug Beeferman, Adam Berger, John Lafferty

    Abstract: This paper introduces a new statistical approach to partitioning text automatically into coherent segments. Our approach enlists both short-range and long-range language models to help it sniff out likely sites of topic changes in text. To aid its search, the system consults a set of simple lexical hints it has learned to associate with the presence of boundaries through inspection of a large co… ▽ More

    Submitted 12 June, 1997; v1 submitted 11 June, 1997; originally announced June 1997.

    Comments: 12 pages, LaTeX source and postscript figures for EMNLP-2 paper