Skip to main content

Showing 1–50 of 110 results for author: Daume, H

.
  1. arXiv:2506.13468  [pdf, ps, other

    cs.CL cs.AI

    An Interdisciplinary Approach to Human-Centered Machine Translation

    Authors: Marine Carpuat, Omri Asscher, Kalika Bali, Luisa Bentivogli, Frédéric Blain, Lynne Bowker, Monojit Choudhury, Hal Daumé III, Kevin Duh, Ge Gao, Alvin Grissom II, Marzena Karpinska, Elaine C. Khoong, William D. Lewis, André F. T. Martins, Mary Nurminen, Douglas W. Oard, Maja Popovic, Michel Simard, François Yvon

    Abstract: Machine Translation (MT) tools are widely used today, often in contexts where professional translators are not present. Despite progress in MT technology, a gap persists between system development and real-world usage, particularly for non-expert users who may struggle to assess translation reliability. This paper advocates for a human-centered approach to MT, emphasizing the alignment of system d… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 20 pages

  2. arXiv:2505.19317  [pdf, other

    cs.AI cs.CY cs.HC cs.LG

    Effort-aware Fairness: Incorporating a Philosophy-informed, Human-centered Notion of Effort into Algorithmic Fairness Metrics

    Authors: Tin Nguyen, Jiannan Xu, Zora Che, Phuong-Anh Nguyen-Le, Rushil Dandamudi, Donald Braman, Furong Huang, Hal Daumé III, Zubin Jelveh

    Abstract: Although popularized AI fairness metrics, e.g., demographic parity, have uncovered bias in AI-assisted decision-making outcomes, they do not consider how much effort one has spent to get to where one is today in the input feature space. However, the notion of effort is important in how Philosophy and humans understand fairness. We propose a philosophy-informed way to conceptualize and evaluate Eff… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  3. arXiv:2505.19299  [pdf, ps, other

    cs.CL cs.AI

    A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations

    Authors: Lingjun Zhao, Hal Daumé III

    Abstract: Faithful free-text explanations are important to ensure transparency in high-stakes AI decision-making contexts, but they are challenging to generate by language models and assess by humans. In this paper, we present a measure for Prediction-EXplanation (PEX) consistency, by extending the concept of weight of evidence. This measure quantifies how much a free-text explanation supports or opposes a… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  4. arXiv:2505.09868  [pdf, other

    cs.CY cs.AI cs.HC

    Which Demographic Features Are Relevant for Individual Fairness Evaluation of U.S. Recidivism Risk Assessment Tools?

    Authors: Tin Trung Nguyen, Jiannan Xu, Phuong-Anh Nguyen-Le, Jonathan Lazar, Donald Braman, Hal Daumé III, Zubin Jelveh

    Abstract: Despite its constitutional relevance, the technical ``individual fairness'' criterion has not been operationalized in U.S. state or federal statutes/regulations. We conduct a human subjects experiment to address this gap, evaluating which demographic features are relevant for individual fairness evaluation of recidivism risk assessment (RRA) tools. Our analyses conclude that the individual similar… ▽ More

    Submitted 26 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

  5. arXiv:2505.02749  [pdf, other

    cs.CY

    How May U.S. Courts Scrutinize Their Recidivism Risk Assessment Tools? Contextualizing AI Fairness Criteria on a Judicial Scrutiny-based Framework

    Authors: Tin Nguyen, Jiannan Xu, Phuong-Anh Nguyen-Le, Jonathan Lazar, Donald Braman, Hal Daumé III, Zubin Jelveh

    Abstract: The AI/HCI and legal communities have developed largely independent conceptualizations of fairness. This conceptual difference hinders the potential incorporation of technical fairness criteria (e.g., procedural, group, and individual fairness) into sustainable policies and designs, particularly for high-stakes applications like recidivism risk assessment. To foster common ground, we conduct legal… ▽ More

    Submitted 26 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  6. arXiv:2503.01030  [pdf, other

    cs.CL cs.AI

    Language Models Predict Empathy Gaps Between Social In-groups and Out-groups

    Authors: Yu Hou, Hal Daumé III, Rachel Rudinger

    Abstract: Studies of human psychology have demonstrated that people are more motivated to extend empathy to in-group members than out-group members (Cikara et al., 2011). In this study, we investigate how this aspect of intergroup relations in humans is replicated by LLMs in an emotion intensity prediction task. In this task, the LLM is given a short description of an experience a person had that caused the… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: NAACL 2025

  7. arXiv:2502.15079  [pdf, other

    cs.CV cs.AI cs.CL

    Can Hallucination Correction Improve Video-Language Alignment?

    Authors: Lingjun Zhao, Mingyang Xie, Paola Cascante-Bonilla, Hal Daumé III, Kwonjoon Lee

    Abstract: Large Vision-Language Models often generate hallucinated content that is not grounded in its visual inputs. While prior work focuses on mitigating hallucinations, we instead explore leveraging hallucination correction as a training objective to improve video-language alignment. We introduce HACA, a self-training framework learning to correct hallucinations in descriptions that do not align with th… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  8. arXiv:2502.04564  [pdf, other

    cs.CL

    My LLM might Mimic AAE -- But When Should it?

    Authors: Sandra C. Sandoval, Christabel Acquaye, Kwesi Cobbina, Mohammad Nayeem Teli, Hal Daumé III

    Abstract: We examine the representation of African American English (AAE) in large language models (LLMs), exploring (a) the perceptions Black Americans have of how effective these technologies are at producing authentic AAE, and (b) in what contexts Black Americans find this desirable. Through both a survey of Black Americans ($n=$ 104) and annotation of LLM-produced AAE by Black Americans ($n=$ 228), we f… ▽ More

    Submitted 10 February, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

    Comments: Accepted to NAACL 2025

  9. arXiv:2412.10345  [pdf, ps, other

    cs.RO cs.AI

    TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

    Authors: Ruijie Zheng, Yongyuan Liang, Shuaiyi Huang, Jianfeng Gao, Hal Daumé III, Andrey Kolobov, Furong Huang, Jianwei Yang

    Abstract: Although large vision-language-action (VLA) models pretrained on extensive robot datasets offer promising generalist policies for robotic learning, they still struggle with spatial-temporal dynamics in interactive robotics, making them less effective in handling complex tasks, such as manipulation. In this work, we introduce visual trace prompting, a simple yet effective approach to facilitate VLA… ▽ More

    Submitted 5 June, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

  10. arXiv:2411.11437  [pdf, other

    cs.DL cs.CL stat.AP

    Causal Effect of Group Diversity on Redundancy and Coverage in Peer-Reviewing

    Authors: Navita Goyal, Ivan Stelmakh, Nihar Shah, Hal Daumé III

    Abstract: A large host of scientific journals and conferences solicit peer reviews from multiple reviewers for the same submission, aiming to gather a broader range of perspectives and mitigate individual biases. In this work, we reflect on the role of diversity in the slate of reviewers assigned to evaluate a submitted paper as a factor in diversifying perspectives and improving the utility of the peer-rev… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  11. arXiv:2411.05783  [pdf, other

    cs.CL cs.AI cs.CV cs.HC

    ASL STEM Wiki: Dataset and Benchmark for Interpreting STEM Articles

    Authors: Kayo Yin, Chinmay Singh, Fyodor O. Minakov, Vanessa Milan, Hal Daumé III, Cyril Zhang, Alex X. Lu, Danielle Bragg

    Abstract: Deaf and hard-of-hearing (DHH) students face significant barriers in accessing science, technology, engineering, and mathematics (STEM) education, notably due to the scarcity of STEM resources in signed languages. To help address this, we introduce ASL STEM Wiki: a parallel corpus of 254 Wikipedia articles on STEM topics in English, interpreted into over 300 hours of American Sign Language (ASL).… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

    Comments: Accepted to EMNLP 2024

  12. arXiv:2410.22315  [pdf, other

    cs.CL cs.CV

    Natural Language Inference Improves Compositionality in Vision-Language Models

    Authors: Paola Cascante-Bonilla, Yu Hou, Yang Trista Cao, Hal Daumé III, Rachel Rudinger

    Abstract: Compositional reasoning in Vision-Language Models (VLMs) remains challenging as these models often struggle to relate objects, attributes, and spatial relationships. Recent methods aim to address these limitations by relying on the semantics of the textual description, using Large Language Models (LLMs) to break them down into subsets of questions and answers. However, these methods primarily oper… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

    Comments: Project page: https://cece-vlm.github.io/

  13. arXiv:2410.06524  [pdf, other

    cs.CL cs.AI cs.LG

    Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA

    Authors: Maharshi Gor, Hal Daumé III, Tianyi Zhou, Jordan Boyd-Graber

    Abstract: Recent advancements of large language models (LLMs) have led to claims of AI surpassing humans in natural language processing (NLP) tasks such as textual understanding and reasoning. This work investigates these assertions by introducing CAIMIRA, a novel framework rooted in item response theory (IRT) that enables quantitative assessment and comparison of problem-solving abilities of question-answe… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: To appear at EMNLP 2024 (Main)

  14. arXiv:2409.20390  [pdf, other

    cs.CL

    Anti-stereotypical Predictive Text Suggestions Do Not Reliably Yield Anti-stereotypical Writing

    Authors: Connor Baumler, Hal Daumé III

    Abstract: AI-based systems such as language models can replicate and amplify social biases reflected in their training data. Among other questionable behavior, this can lead to LM-generated text--and text suggestions--that contain normatively inappropriate stereotypical associations. In this paper, we consider the question of how "debiasing" a language model impacts stories that people write using that lang… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

  15. arXiv:2406.12232  [pdf, other

    cs.AI cs.CL

    "You Gotta be a Doctor, Lin": An Investigation of Name-Based Bias of Large Language Models in Employment Recommendations

    Authors: Huy Nghiem, John Prindle, Jieyu Zhao, Hal Daumé III

    Abstract: Social science research has shown that candidates with names indicative of certain races or genders often face discrimination in employment practices. Similarly, Large Language Models (LLMs) have demonstrated racial and gender biases in various applications. In this study, we utilize GPT-3.5-Turbo and Llama 3-70B-Instruct to simulate hiring decisions and salary recommendations for candidates with… ▽ More

    Submitted 5 October, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: EMNLP 2024, 20 pages

  16. arXiv:2403.11456  [pdf, other

    cs.CL cs.AI cs.SI

    HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models

    Authors: Huy Nghiem, Hal Daumé III

    Abstract: The widespread use of social media necessitates reliable and efficient detection of offensive content to mitigate harmful effects. Although sophisticated models perform well on individual datasets, they often fail to generalize due to varying definitions and labeling of "offensive content." In this paper, we introduce HateCOT, an English dataset with over 52,000 samples from diverse sources, featu… ▽ More

    Submitted 5 October, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: EMNLP 2024 Findings

  17. arXiv:2403.01015  [pdf, other

    cs.CY cs.DL

    A Randomized Controlled Trial on Anonymizing Reviewers to Each Other in Peer Review Discussions

    Authors: Charvi Rastogi, Xiangchen Song, Zhijing Jin, Ivan Stelmakh, Hal Daumé III, Kun Zhang, Nihar B. Shah

    Abstract: Peer review often involves reviewers submitting their independent reviews, followed by a discussion among reviewers of each paper. A question among policymakers is whether the reviewers of a paper should be anonymous to each other during the discussion. We shed light on this by conducting a randomized controlled trial at the UAI 2022 conference. We randomly split the reviewers and papers into two… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 18 pages, 4 figures, 3 tables

  18. arXiv:2402.16973  [pdf, other

    cs.AI cs.CL cs.HC

    Successfully Guiding Humans with Imperfect Instructions by Highlighting Potential Errors and Suggesting Corrections

    Authors: Lingjun Zhao, Khanh Nguyen, Hal Daumé III

    Abstract: Language models will inevitably err in situations with which they are unfamiliar. However, by effectively communicating uncertainties, they can still guide humans toward making sound decisions in those contexts. We demonstrate this idea by developing HEAR, a system that can successfully guide humans in simulated residential environments despite generating potentially inaccurate instructions. Diver… ▽ More

    Submitted 4 October, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: EMNLP 2024

  19. arXiv:2402.10450  [pdf, other

    cs.LG

    PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control

    Authors: Ruijie Zheng, Ching-An Cheng, Hal Daumé III, Furong Huang, Andrey Kolobov

    Abstract: Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making. In this work, we propose a novel view that treats inducing temporal action abstractions as a sequence compression problem. To do so, we bring a subtle but critical component of LLM training pipelines -- input tokenization via byte pair encoding (BPE) --… ▽ More

    Submitted 6 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted at the Forty-first International Conference on Machine Learning (ICML 2024)

  20. arXiv:2402.06187  [pdf, other

    cs.LG cs.AI cs.RO

    Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

    Authors: Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Shuang Ma, Hal Daumé III, Huazhe Xu, John Langford, Praveen Palanisamy, Kalyan Shankar Basu, Furong Huang

    Abstract: We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks. Premier-TACO leverages a subset of multitask offline datasets for pretraining a general feature representation, which captures critical environmental dynamics and is fine-tuned using minimal expert demonstrations. It advances the… ▽ More

    Submitted 23 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: Accepted at Forty-first International Conference on Machine Learning (ICML 2024)

  21. arXiv:2312.07141  [pdf, other

    cs.CL

    Multilingual large language models leak human stereotypes across language boundaries

    Authors: Yang Trista Cao, Anna Sotnikova, Jieyu Zhao, Linda X. Zou, Rachel Rudinger, Hal Daume III

    Abstract: Multilingual large language models have gained prominence for their proficiency in processing and generating text across languages. Like their monolingual counterparts, multilingual models are likely to pick up on stereotypes and other social biases present in their training data. In this paper, we study a phenomenon we term stereotype leakage, which refers to how training a model multilingually m… ▽ More

    Submitted 19 November, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  22. arXiv:2311.07879  [pdf, other

    cs.CL cs.AI

    Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators

    Authors: Yang Trista Cao, Lovely-Frances Domingo, Sarah Ann Gilbert, Michelle Mazurek, Katie Shilton, Hal Daumé III

    Abstract: Extensive efforts in automated approaches for content moderation have been focused on developing models to identify toxic, offensive, and hateful content with the aim of lightening the load for moderators. Yet, it remains uncertain whether improvements on those tasks have truly addressed moderators' needs in accomplishing their work. In this paper, we surface gaps between past research efforts tha… ▽ More

    Submitted 13 November, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

  23. arXiv:2310.19668  [pdf, other

    cs.LG cs.CV

    DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization

    Authors: Guowei Xu, Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Zhecheng Yuan, Tianying Ji, Yu Luo, Xiaoyu Liu, Jiaxin Yuan, Pu Hua, Shuzhen Li, Yanjie Ze, Hal Daumé III, Furong Huang, Huazhe Xu

    Abstract: Visual reinforcement learning (RL) has shown promise in continuous control tasks. Despite its progress, current algorithms are still unsatisfactory in virtually every aspect of the performance such as sample efficiency, asymptotic performance, and their robustness to the choice of random seeds. In this paper, we identify a major shortcoming in existing visual RL methods that is the agents often ex… ▽ More

    Submitted 13 February, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted at The Twelfth International Conference on Learning Representations (ICLR 2024)

  24. arXiv:2310.15319  [pdf, other

    cs.CL cs.AI cs.LG

    Hallucination Detection for Grounded Instruction Generation

    Authors: Lingjun Zhao, Khanh Nguyen, Hal Daumé III

    Abstract: We investigate the problem of generating instructions to guide humans to navigate in simulated residential environments. A major issue with current models is hallucination: they generate references to actions or objects that are inconsistent with what a human follower would perform or encounter along the described path. We develop a model that detects these hallucinated references by adopting a mo… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  25. arXiv:2310.15055  [pdf, other

    cs.CL cs.AI cs.HC

    Towards Conceptualization of "Fair Explanation": Disparate Impacts of anti-Asian Hate Speech Explanations on Content Moderators

    Authors: Tin Nguyen, Jiannan Xu, Aayushi Roy, Hal Daumé III, Marine Carpuat

    Abstract: Recent research at the intersection of AI explainability and fairness has focused on how explanations can improve human-plus-AI task performance as assessed by fairness measures. We propose to characterize what constitutes an explanation that is itself "fair" -- an explanation that does not adversely impact specific populations. We formulate a novel evaluation method of "fair explanations" using n… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Main Conference (Long Paper)

  26. arXiv:2310.13004  [pdf, other

    cs.LG cs.AI cs.HC

    Progressively Efficient Learning

    Authors: Ruijie Zheng, Khanh Nguyen, Hal Daumé III, Furong Huang, Karthik Narasimhan

    Abstract: Assistant AI agents should be capable of rapidly acquiring novel skills and adapting to new user preferences. Traditional frameworks like imitation learning and reinforcement learning do not facilitate this capability because they support only low-level, inefficient forms of communication. In contrast, humans communicate with progressive efficiency by defining and sharing abstract intentions. Repr… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  27. arXiv:2310.12558  [pdf, other

    cs.CL cs.HC

    Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong

    Authors: Chenglei Si, Navita Goyal, Sherry Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé III, Jordan Boyd-Graber

    Abstract: Large Language Models (LLMs) are increasingly used for accessing information on the web. Their truthfulness and factuality are thus of great interest. To help users make the right decisions about the information they get, LLMs should not only provide information but also help users fact-check it. Our experiments with 80 crowdworkers compare language models with search engines (information retrieva… ▽ More

    Submitted 1 April, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: NAACL 2024

  28. The Impact of Explanations on Fairness in Human-AI Decision-Making: Protected vs Proxy Features

    Authors: Navita Goyal, Connor Baumler, Tin Nguyen, Hal Daumé III

    Abstract: AI systems have been known to amplify biases in real-world data. Explanations may help human-AI teams address these biases for fairer decision-making. Typically, explanations focus on salient input features. If a model is biased against some protected group, explanations may include features that demonstrate this bias, but when biases are realized through proxy features, the relationship between t… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: IUI 2024

  29. arXiv:2306.13229  [pdf, other

    cs.LG cs.AI

    TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

    Authors: Ruijie Zheng, Xiyao Wang, Yanchao Sun, Shuang Ma, Jieyu Zhao, Huazhe Xu, Hal Daumé III, Furong Huang

    Abstract: Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle. Prior works have attempted to address this challenge by creating self-supervised auxiliary tasks, aiming to enrich the agent's learned representations with control-relevant information for future state prediction. However, these objectives are often insuffici… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

    Comments: Accepted at 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  30. arXiv:2306.05949   

    cs.CY cs.AI

    Evaluating the Social Impact of Generative AI Systems in Systems and Society

    Authors: Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan Baker, Su Lin Blodgett, Canyu Chen, Hal Daumé III, Jesse Dodge, Isabella Duan, Ellie Evans, Felix Friedrich, Avijit Ghosh, Usman Gohar, Sara Hooker, Yacine Jernite, Ria Kalluri, Alberto Lusoli, Alina Leidinger, Michelle Lin, Xiuzhu Lin, Sasha Luccioni, Jennifer Mickel, Margaret Mitchell, Jessica Newman , et al. (6 additional authors not shown)

    Abstract: Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts, but there is no official standard for means of evaluating those impacts or for which impacts should be evaluated. In this paper, we present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality in two overarching categor… ▽ More

    Submitted 28 June, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: This version has been removed by arXiv administrators as the submitter did not have the right to agree to the license at the time of submission

  31. arXiv:2305.14331  [pdf, other

    cs.CL cs.AI

    What Else Do I Need to Know? The Effect of Background Information on Users' Reliance on QA Systems

    Authors: Navita Goyal, Eleftheria Briakou, Amanda Liu, Connor Baumler, Claire Bonial, Jeffrey Micher, Clare R. Voss, Marine Carpuat, Hal Daumé III

    Abstract: NLP systems have shown impressive performance at answering questions by retrieving relevant context. However, with the increasingly large models, it is impossible and often undesirable to constrain models' knowledge or reasoning to only the retrieved context. This leads to a mismatch between the information that the models access to derive the answer and the information that is available to the us… ▽ More

    Submitted 25 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  32. arXiv:2305.09022  [pdf, other

    cs.CL

    It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance

    Authors: Arjun Subramonian, Xingdi Yuan, Hal Daumé III, Su Lin Blodgett

    Abstract: Progress in NLP is increasingly measured through benchmarks; hence, contextualizing progress requires understanding when and why practitioners may disagree about the validity of benchmarks. We develop a taxonomy of disagreement, drawing on tools from measurement modeling, and distinguish between two types of disagreement: 1) how tasks are conceptualized and 2) how measurements of model performance… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Journal ref: Findings of the Association for Computational Linguistics: ACL 2023

  33. arXiv:2304.05934  [pdf, other

    cs.CV cs.CL

    ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition

    Authors: Aashaka Desai, Lauren Berger, Fyodor O. Minakov, Vanessa Milan, Chinmay Singh, Kriston Pumphrey, Richard E. Ladner, Hal Daumé III, Alex X. Lu, Naomi Caselli, Danielle Bragg

    Abstract: Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide. However, most communication technologies operate in spoken and written languages, creating inequities in access. To help tackle this problem, we release ASL Citizen, the first crowdsourced Isolated Sign Language Recognition (ISLR) dataset, collected with consent and containing 83,399 videos for 2,73… ▽ More

    Submitted 19 June, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

  34. arXiv:2301.05149  [pdf, other

    cs.CL cs.AI cs.HC cs.LG cs.RO

    Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation Models

    Authors: Lingjun Zhao, Khanh Nguyen, Hal Daumé III

    Abstract: Recent work studies the cognitive capabilities of language models through psychological tests designed for humans. While these studies are helpful for understanding the general capabilities of these models, there is no guarantee that a model possessing sufficient capabilities to pass those tests would actually use those capabilities in performing real-life tasks. In this work, we formulate task-or… ▽ More

    Submitted 28 May, 2023; v1 submitted 20 December, 2022; originally announced January 2023.

    Comments: Findings of ACL 2023

  35. arXiv:2211.12966  [pdf, other

    cs.LG cs.DB cs.DL

    How do Authors' Perceptions of their Papers Compare with Co-authors' Perceptions and Peer-review Decisions?

    Authors: Charvi Rastogi, Ivan Stelmakh, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan, Zhenyu Xue, Hal Daumé III, Emma Pierson, Nihar B. Shah

    Abstract: How do author perceptions match up to the outcomes of the peer-review process and perceptions of others? In a top-tier computer science conference (NeurIPS 2021) with more than 23,000 submitting authors and 9,000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  36. arXiv:2211.06753  [pdf, other

    cs.HC cs.AI

    Seamful XAI: Operationalizing Seamful Design in Explainable AI

    Authors: Upol Ehsan, Q. Vera Liao, Samir Passi, Mark O. Riedl, Hal Daume III

    Abstract: Mistakes in AI systems are inevitable, arising from both technical limitations and sociotechnical gaps. While black-boxing AI systems can make the user experience seamless, hiding the seams risks disempowering users to mitigate fallouts from AI mistakes. Instead of hiding these AI imperfections, can we leverage them to help the user? While Explainable AI (XAI) has predominantly tackled algorithmic… ▽ More

    Submitted 5 March, 2024; v1 submitted 12 November, 2022; originally announced November 2022.

    Journal ref: ACM CSCW 2024

  37. arXiv:2210.14966  [pdf, other

    cs.CL cs.AI cs.CV

    What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?

    Authors: Yang Trista Cao, Kyle Seelman, Kyungjun Lee, Hal Daumé III

    Abstract: In visual question answering (VQA), a machine must answer a question given an associated image. Recently, accessibility researchers have explored whether VQA can be deployed in a real-world setting where users with visual impairments learn about their environment by capturing their visual surroundings and asking questions. However, most of the existing benchmarking datasets for VQA focus on machin… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Journal ref: AACL-IJCNLP 2022 The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing

  38. arXiv:2206.11684  [pdf, other

    cs.CL

    Theory-Grounded Measurement of U.S. Social Stereotypes in English Language Models

    Authors: Yang Trista Cao, Anna Sotnikova, Hal Daumé III, Rachel Rudinger, Linda Zou

    Abstract: NLP models trained on text have been shown to reproduce human stereotypes, which can magnify harms to marginalized groups when systems are deployed at scale. We adapt the Agency-Belief-Communion (ABC) stereotype model of Koch et al. (2016) from social psychology as a framework for the systematic study and discovery of stereotypic group-trait associations in language models (LMs). We introduce the… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

  39. arXiv:2205.06828  [pdf, other

    cs.CL cs.AI

    Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications

    Authors: Kaitlyn Zhou, Su Lin Blodgett, Adam Trischler, Hal Daumé III, Kaheer Suleman, Alexandra Olteanu

    Abstract: There are many ways to express similar things in text, which makes evaluating natural language generation (NLG) systems difficult. Compounding this difficulty is the need to assess varying quality criteria depending on the deployment setting. While the landscape of NLG evaluation has been well-mapped, practitioners' goals, assumptions, and constraints -- which inform decisions about what, when, an… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: Camera Ready for NAACL 2022 (Main Conference)

  40. arXiv:2110.08258  [pdf, other

    cs.LG cs.AI cs.HC cs.RO

    A Framework for Learning to Request Rich and Contextually Useful Information from Humans

    Authors: Khanh Nguyen, Yonatan Bisk, Hal Daumé III

    Abstract: When deployed, AI agents will encounter problems that are beyond their autonomous problem-solving capabilities. Leveraging human assistance can help agents overcome their inherent limitations and robustly cope with unfamiliar situations. We present a general interactive framework that enables an agent to request and interpret rich, contextually useful information from an assistant that has knowled… ▽ More

    Submitted 22 June, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: Accepted to ICML 2022

  41. arXiv:2110.04889  [pdf, other

    cs.CL

    Distantly-Supervised Evidence Retrieval Enables Question Answering without Evidence Annotation

    Authors: Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber, Hal Daumé III

    Abstract: Open-domain question answering answers a question based on evidence retrieved from a large corpus. State-of-the-art neural approaches require intermediate evidence annotations for training. However, such intermediate annotations are expensive, and methods that rely on them cannot transfer to the more common setting, where only question-answer pairs are available. This paper investigates whether mo… ▽ More

    Submitted 10 October, 2021; originally announced October 2021.

    Comments: EMNLP 2021

  42. arXiv:2104.13299  [pdf, other

    cs.AI cs.LG

    From Human Explanation to Model Interpretability: A Framework Based on Weight of Evidence

    Authors: David Alvarez-Melis, Harmanpreet Kaur, Hal Daumé III, Hanna Wallach, Jennifer Wortman Vaughan

    Abstract: We take inspiration from the study of human explanation to inform the design and evaluation of interpretability methods in machine learning. First, we survey the literature on human explanation in philosophy, cognitive science, and the social sciences, and propose a list of design principles for machine-generated explanations that are meaningful to humans. Using the concept of weight of evidence f… ▽ More

    Submitted 20 September, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

    Comments: HCOMP 2021

  43. arXiv:2104.05883  [pdf, other

    cs.CL

    Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval

    Authors: Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber, Hal Daumé III

    Abstract: Complex question answering often requires finding a reasoning chain that consists of multiple evidence pieces. Current approaches incorporate the strengths of structured knowledge and unstructured text, assuming text corpora is semi-structured. Building on dense retrieval methods, we propose a new multi-step retrieval approach (BeamDR) that iteratively forms an evidence chain through beam search i… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

  44. arXiv:2011.15083  [pdf, other

    cs.HC cs.LG stat.AP

    A Large Scale Randomized Controlled Trial on Herding in Peer-Review Discussions

    Authors: Ivan Stelmakh, Charvi Rastogi, Nihar B. Shah, Aarti Singh, Hal Daumé III

    Abstract: Peer review is the backbone of academia and humans constitute a cornerstone of this process, being responsible for reviewing papers and making the final acceptance/rejection decisions. Given that human decision making is known to be susceptible to various cognitive biases, it is important to understand which (if any) biases are present in the peer-review process and design the pipeline such that t… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

  45. arXiv:2011.15050  [pdf, other

    cs.HC cs.LG

    A Novice-Reviewer Experiment to Address Scarcity of Qualified Reviewers in Large Conferences

    Authors: Ivan Stelmakh, Nihar B. Shah, Aarti Singh, Hal Daumé III

    Abstract: Conference peer review constitutes a human-computation process whose importance cannot be overstated: not only it identifies the best submissions for acceptance, but, ultimately, it impacts the future of the whole research area by promoting some ideas and restraining others. A surge in the number of submissions received by leading AI conferences has challenged the sustainability of the review proc… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

  46. arXiv:2011.14646  [pdf, other

    cs.DL cs.LG stat.AP

    Prior and Prejudice: The Novice Reviewers' Bias against Resubmissions in Conference Peer Review

    Authors: Ivan Stelmakh, Nihar B. Shah, Aarti Singh, Hal Daumé III

    Abstract: Modern machine learning and computer science conferences are experiencing a surge in the number of submissions that challenges the quality of peer review as the number of competent reviewers is growing at a much slower rate. To curb this trend and reduce the burden on reviewers, several conferences have started encouraging or even requiring authors to declare the previous submission history of the… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

  47. arXiv:2010.11246  [pdf, other

    cs.CL cs.AI

    On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries

    Authors: Tianze Shi, Chen Zhao, Jordan Boyd-Graber, Hal Daumé III, Lillian Lee

    Abstract: Large-scale semantic parsing datasets annotated with logical forms have enabled major advances in supervised approaches. But can richer supervision help even more? To explore the utility of fine-grained, lexical-level supervision, we introduce Squall, a dataset that enriches 11,276 WikiTableQuestions English-language questions with manually created SQL equivalents plus alignments between SQL and q… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: Findings of ACL: EMNLP 2020

    ACM Class: I.2.7

    Journal ref: Findings of ACL: EMNLP 2020

  48. arXiv:2006.07777  [pdf, other

    cs.LG cs.HC stat.ML

    Active Imitation Learning from Multiple Non-Deterministic Teachers: Formulation, Challenges, and Algorithms

    Authors: Khanh Nguyen, Hal Daumé III

    Abstract: We formulate the problem of learning to imitate multiple, non-deterministic teachers with minimal interaction cost. Rather than learning a specific policy as in standard imitation learning, the goal in this problem is to learn a distribution over a policy space. We first present a general framework that efficiently models and estimates such a distribution by learning continuous representations of… ▽ More

    Submitted 13 June, 2020; originally announced June 2020.

  49. arXiv:2005.14050  [pdf, other

    cs.CL cs.CY

    Language (Technology) is Power: A Critical Survey of "Bias" in NLP

    Authors: Su Lin Blodgett, Solon Barocas, Hal Daumé III, Hanna Wallach

    Abstract: We survey 146 papers analyzing "bias" in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing "bias" is an inherently normative process. We further find that these papers' proposed quantitative techniques for measuring or mitigating "bias" are poorly matched to their motivations and do not engage with the rel… ▽ More

    Submitted 29 May, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

  50. arXiv:2005.13718  [pdf, other

    cs.CY cs.IR cs.LG

    Operationalizing the Legal Principle of Data Minimization for Personalization

    Authors: Asia J. Biega, Peter Potash, Hal Daumé III, Fernando Diaz, Michèle Finck

    Abstract: Article 5(1)(c) of the European Union's General Data Protection Regulation (GDPR) requires that "personal data shall be [...] adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed (`data minimisation')". To date, the legal and computational definitions of `purpose limitation' and `data minimization' remain largely unclear. In particular, the… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: SIGIR 2020 paper: In Proc. of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval