Skip to main content

Showing 1–4 of 4 results for author: Grolleau, F

.
  1. arXiv:2505.23802  [pdf, ps, other

    cs.CL cs.AI

    MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

    Authors: Suhana Bedi, Hejie Cui, Miguel Fuentes, Alyssa Unell, Michael Wornow, Juan M. Banda, Nikesh Kotecha, Timothy Keyes, Yifan Mai, Mert Oez, Hao Qiu, Shrey Jain, Leonardo Schettini, Mehr Kashyap, Jason Alan Fries, Akshay Swaminathan, Philip Chung, Fateme Nateghi, Asad Aali, Ashwin Nayak, Shivam Vedak, Sneha S. Jain, Birju Patel, Oluseyi Fayanju, Shreya Shah , et al. (56 additional authors not shown)

    Abstract: While large language models (LLMs) achieve near-perfect scores on medical licensing exams, these evaluations inadequately reflect the complexity and diversity of real-world clinical practice. We introduce MedHELM, an extensible evaluation framework for assessing LLM performance for medical tasks with three key contributions. First, a clinician-validated taxonomy spanning 5 categories, 22 subcatego… ▽ More

    Submitted 2 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  2. arXiv:2501.03155  [pdf, other

    stat.ME

    powerROC: An Interactive Web Tool for Sample Size Calculation in Assessing Models' Discriminative Abilities

    Authors: François Grolleau, Robert Tibshirani, Jonathan H. Chen

    Abstract: Rigorous external validation is crucial for assessing the generalizability of prediction models, particularly by evaluating their discrimination (AUROC) on new data. This often involves comparing a new model's AUROC to that of an established reference model. However, many studies rely on arbitrary rules of thumb for sample size calculations, often resulting in underpowered analyses and unreliable… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

  3. arXiv:2405.02779  [pdf, ps, other

    stat.ME

    Estimating Complier Average Causal Effects with Mixtures of Experts

    Authors: François Grolleau, Céline Béji, Raphaël Porcher, François Petit

    Abstract: Treatment non-compliance, where individuals deviate from their assigned experimental conditions, frequently complicates the estimation of causal effects. To address this, we introduce a novel learning framework based on a mixture of experts architecture to estimate the Complier Average Causal Effect (CACE). Our framework provides a flexible alternative to classical instrumental variable methods by… ▽ More

    Submitted 24 June, 2025; v1 submitted 4 May, 2024; originally announced May 2024.

  4. arXiv:2207.06275  [pdf, other

    stat.ME

    A Comprehensive Framework for the Evaluation of Individual Treatment Rules From Observational Data

    Authors: François Grolleau, Francois Petit, Raphaël Porcher

    Abstract: Individualized treatment rules (ITRs) are deterministic decision rules that recommend treatments to individuals based on their characteristics. Though ubiquitous in medicine, ITRs are hardly ever evaluated in randomized controlled trials. To evaluate ITRs from observational data, we introduce a new probabilistic model and distinguish two situations: i) the situation of a newly developed ITR, where… ▽ More

    Submitted 21 August, 2023; v1 submitted 13 July, 2022; originally announced July 2022.