Skip to main content

Showing 1–9 of 9 results for author: Guerdan, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.05965  [pdf, other

    cs.LG cs.CY cs.HC

    Validating LLM-as-a-Judge Systems in the Absence of Gold Labels

    Authors: Luke Guerdan, Solon Barocas, Kenneth Holstein, Hanna Wallach, Zhiwei Steven Wu, Alexandra Chouldechova

    Abstract: The LLM-as-a-judge paradigm, in which a judge LLM system replaces human raters in rating the outputs of other generative AI (GenAI) systems, has come to play a critical role in scaling and standardizing GenAI evaluations. To validate judge systems, evaluators collect multiple human ratings for each item in a validation corpus, and then aggregate the ratings into a single, per-item gold label ratin… ▽ More

    Submitted 11 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

  2. arXiv:2411.13760  [pdf, other

    cs.LG cs.CL cs.HC

    A Framework for Evaluating LLMs Under Task Indeterminacy

    Authors: Luke Guerdan, Hanna Wallach, Solon Barocas, Alexandra Chouldechova

    Abstract: Large language model (LLM) evaluations often assume there is a single correct response -- a gold label -- for each item in the evaluation corpus. However, some tasks can be ambiguous -- i.e., they provide insufficient information to identify a unique interpretation -- or vague -- i.e., they do not clearly indicate where to draw the line when making a determination. Both ambiguity and vagueness can… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

    Comments: To Appear in NeurIPS 2024 Workshops on Evaluating Evaluations (EvalEval) and Statistical Foundations of LLMs and Foundation Models (SFLLM)

  3. arXiv:2404.00848  [pdf, other

    cs.LG cs.CY stat.ME

    Predictive Performance Comparison of Decision Policies Under Confounding

    Authors: Luke Guerdan, Amanda Coston, Kenneth Holstein, Zhiwei Steven Wu

    Abstract: Predictive models are often introduced to decision-making tasks under the rationale that they improve performance over an existing decision-making policy. However, it is challenging to compare predictive performance against an existing decision-making policy that is generally under-specified and dependent on unobservable factors. These sources of uncertainty are often addressed in practice by maki… ▽ More

    Submitted 11 June, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: ICML 2024

  4. arXiv:2308.15700  [pdf, other

    cs.HC cs.AI cs.LG

    Training Towards Critical Use: Learning to Situate AI Predictions Relative to Human Knowledge

    Authors: Anna Kawakami, Luke Guerdan, Yanghuidi Cheng, Matthew Lee, Scott Carter, Nikos Arechiga, Kate Glazko, Haiyi Zhu, Kenneth Holstein

    Abstract: A growing body of research has explored how to support humans in making better use of AI-based decision support, including via training and onboarding. Existing research has focused on decision-making tasks where it is possible to evaluate "appropriate reliance" by comparing each decision against a ground truth label that cleanly maps to both the AI's predictive target and the human decision-maker… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  5. arXiv:2302.11121  [pdf, other

    cs.LG cs.CY cs.HC stat.ME

    Counterfactual Prediction Under Outcome Measurement Error

    Authors: Luke Guerdan, Amanda Coston, Kenneth Holstein, Zhiwei Steven Wu

    Abstract: Across domains such as medicine, employment, and criminal justice, predictive models often target labels that imperfectly reflect the outcomes of interest to experts and policymakers. For example, clinical risk assessments deployed to inform physician decision-making often predict measures of healthcare utilization (e.g., costs, hospitalization) as a proxy for patient medical need. These proxies c… ▽ More

    Submitted 17 May, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: FAccT 2023

  6. arXiv:2302.06503  [pdf, other

    cs.CY cs.AI cs.HC

    Ground(less) Truth: A Causal Framework for Proxy Labels in Human-Algorithm Decision-Making

    Authors: Luke Guerdan, Amanda Coston, Zhiwei Steven Wu, Kenneth Holstein

    Abstract: A growing literature on human-AI decision-making investigates strategies for combining human judgment with statistical models to improve decision-making. Research in this area often evaluates proposed improvements to models, interfaces, or workflows by demonstrating improved predictive performance on "ground truth" labels. However, this practice overlooks a key difference between human judgments a… ▽ More

    Submitted 25 May, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: FAccT 23'

  7. arXiv:2212.05588  [pdf, other

    cs.HC

    Towards a Learner-Centered Explainable AI: Lessons from the learning sciences

    Authors: Anna Kawakami, Luke Guerdan, Yang Cheng, Anita Sun, Alison Hu, Kate Glazko, Nikos Arechiga, Matthew Lee, Scott Carter, Haiyi Zhu, Kenneth Holstein

    Abstract: In this short paper, we argue for a refocusing of XAI around human learning goals. Drawing upon approaches and theories from the learning sciences, we propose a framework for the learner-centered design and evaluation of XAI systems. We illustrate our framework through an ongoing case study in the context of AI-augmented social work.

    Submitted 11 December, 2022; originally announced December 2022.

    Comments: 7 pages, 2 figures

    Journal ref: Human-Centered Explainable AI Workshop at ACM CHI Conference on Human Factors in Computing Systems 2022

  8. arXiv:2201.05527  [pdf, other

    cs.LG cs.HC

    Federated Continual Learning for Socially Aware Robotics

    Authors: Luke Guerdan, Hatice Gunes

    Abstract: From learning assistance to companionship, social robots promise to enhance many aspects of daily life. However, social robots have not seen widespread adoption, in part because (1) they do not adapt their behavior to new users, and (2) they do not provide sufficient privacy protections. Centralized learning, whereby robots develop skills by gathering data on a server, contributes to these limitat… ▽ More

    Submitted 10 July, 2023; v1 submitted 14 January, 2022; originally announced January 2022.

    Comments: IEEE RO-MAN 23'

  9. arXiv:2106.08761  [pdf, other

    cs.CV cs.HC

    Toward Affective XAI: Facial Affect Analysis for Understanding Explainable Human-AI Interactions

    Authors: Luke Guerdan, Alex Raymond, Hatice Gunes

    Abstract: As machine learning approaches are increasingly used to augment human decision-making, eXplainable Artificial Intelligence (XAI) research has explored methods for communicating system behavior to humans. However, these approaches often fail to account for the emotional responses of humans as they interact with explanations. Facial affect analysis, which examines human facial expressions of emotion… ▽ More

    Submitted 15 October, 2021; v1 submitted 16 June, 2021; originally announced June 2021.