Skip to main content

Showing 1–20 of 20 results for author: Schaekermann, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.24040  [pdf, ps, other

    cs.CL cs.AI

    MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering

    Authors: Yuexing Hao, Kumail Alhamoud, Hyewon Jeong, Haoran Zhang, Isha Puri, Philip Torr, Mike Schaekermann, Ariel D. Stern, Marzyeh Ghassemi

    Abstract: Large Language Models (LLMs) have demonstrated remarkable performance on various medical question-answering (QA) benchmarks, including standardized medical exams. However, correct answers alone do not ensure correct logic, and models may reach accurate conclusions through flawed processes. In this study, we introduce the MedPAIR (Medical Dataset Comparing Physicians and AI Relevance Estimation and… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  2. arXiv:2505.04653  [pdf, ps, other

    cs.CL cs.AI cs.CV cs.LG

    Advancing Conversational Diagnostic AI with Multimodal Reasoning

    Authors: Khaled Saab, Jan Freyberg, Chunjong Park, Tim Strother, Yong Cheng, Wei-Hung Weng, David G. T. Barrett, David Stutz, Nenad Tomasev, Anil Palepu, Valentin Liévin, Yash Sharma, Roma Ruparel, Abdullah Ahmed, Elahe Vedadi, Kimberly Kanada, Cian Hughes, Yun Liu, Geoff Brown, Yang Gao, Sean Li, S. Sara Mahdavi, James Manyika, Katherine Chou, Yossi Matias , et al. (11 additional authors not shown)

    Abstract: Large Language Models (LLMs) have demonstrated great potential for conducting diagnostic conversations but evaluation has been largely limited to language-only interactions, deviating from the real-world requirements of remote care delivery. Instant messaging platforms permit clinicians and patients to upload and discuss multimodal medical artifacts seamlessly in medical consultation, but the abil… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  3. arXiv:2503.06074  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Conversational AI for Disease Management

    Authors: Anil Palepu, Valentin Liévin, Wei-Hung Weng, Khaled Saab, David Stutz, Yong Cheng, Kavita Kulkarni, S. Sara Mahdavi, Joëlle Barral, Dale R. Webster, Katherine Chou, Avinatan Hassidim, Yossi Matias, James Manyika, Ryutaro Tanno, Vivek Natarajan, Adam Rodman, Tao Tu, Alan Karthikesalingam, Mike Schaekermann

    Abstract: While large language models (LLMs) have shown promise in diagnostic dialogue, their capabilities for effective management reasoning - including disease progression, therapeutic response, and safe medication prescription - remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE) through a new LLM-based agentic syste… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: 62 pages, 7 figures in main text, 36 figures in appendix

  4. arXiv:2412.16429  [pdf, other

    cs.CY cs.AI cs.LG

    LearnLM: Improving Gemini for Learning

    Authors: LearnLM Team, Abhinit Modi, Aditya Srikanth Veerubhotla, Aliya Rysbek, Andrea Huber, Brett Wiltshire, Brian Veprek, Daniel Gillick, Daniel Kasenberg, Derek Ahmed, Irina Jurenka, James Cohan, Jennifer She, Julia Wilkowski, Kaiz Alarakyia, Kevin R. McKee, Lisa Wang, Markus Kunesch, Mike Schaekermann, Miruna Pîslar, Nikhil Joshi, Parsa Mahmoudieh, Paul Jhun, Sara Wiltberger, Shakir Mohamed , et al. (21 additional authors not shown)

    Abstract: Today's generative AI systems are tuned to present information by default rather than engage users in service of learning as a human tutor would. To address the wide range of potential education use cases for these systems, we reframe the challenge of injecting pedagogical behavior as one of \textit{pedagogical instruction following}, where training and evaluation examples include system-level ins… ▽ More

    Submitted 25 December, 2024; v1 submitted 20 December, 2024; originally announced December 2024.

  5. arXiv:2411.03395  [pdf, other

    cs.HC cs.CL

    Exploring Large Language Models for Specialist-level Oncology Care

    Authors: Anil Palepu, Vikram Dhillon, Polly Niravath, Wei-Hung Weng, Preethi Prasad, Khaled Saab, Ryutaro Tanno, Yong Cheng, Hanh Mai, Ethan Burns, Zainub Ajmal, Kavita Kulkarni, Philip Mansfield, Dale Webster, Joelle Barral, Juraj Gottweis, Mike Schaekermann, S. Sara Mahdavi, Vivek Natarajan, Alan Karthikesalingam, Tao Tu

    Abstract: Large language models (LLMs) have shown remarkable progress in encoding clinical knowledge and responding to complex medical queries with appropriate clinical reasoning. However, their applicability in subspecialist or complex medical settings remains underexplored. In this work, we probe the performance of AMIE, a research conversational diagnostic AI system, in the subspecialist domain of breast… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

  6. arXiv:2410.03741  [pdf, other

    cs.HC cs.AI

    Towards Democratization of Subspeciality Medical Expertise

    Authors: Jack W. O'Sullivan, Anil Palepu, Khaled Saab, Wei-Hung Weng, Yong Cheng, Emily Chu, Yaanik Desai, Aly Elezaby, Daniel Seung Kim, Roy Lan, Wilson Tang, Natalie Tapaskar, Victoria Parikh, Sneha S. Jain, Kavita Kulkarni, Philip Mansfield, Dale Webster, Juraj Gottweis, Joelle Barral, Mike Schaekermann, Ryutaro Tanno, S. Sara Mahdavi, Vivek Natarajan, Alan Karthikesalingam, Euan Ashley , et al. (1 additional authors not shown)

    Abstract: The scarcity of subspecialist medical expertise, particularly in rare, complex and life-threatening diseases, poses a significant challenge for healthcare delivery. This issue is particularly acute in cardiology where timely, accurate management determines outcomes. We explored the potential of AMIE (Articulate Medical Intelligence Explorer), a large language model (LLM)-based experimental AI syst… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  7. arXiv:2404.18416  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Capabilities of Gemini Models in Medicine

    Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

    Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  8. A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models

    Authors: Stephen R. Pfohl, Heather Cole-Lewis, Rory Sayres, Darlene Neal, Mercy Asiedu, Awa Dieng, Nenad Tomasev, Qazi Mamunur Rashid, Shekoofeh Azizi, Negar Rostamzadeh, Liam G. McCoy, Leo Anthony Celi, Yun Liu, Mike Schaekermann, Alanna Walton, Alicia Parrish, Chirag Nagpal, Preeti Singh, Akeiylah Dewitt, Philip Mansfield, Sushant Prakash, Katherine Heller, Alan Karthikesalingam, Christopher Semturs, Joelle Barral , et al. (5 additional authors not shown)

    Abstract: Large language models (LLMs) hold promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. We present resources and methodologies for surfacing biases with potential to precipitate equity-related harms i… ▽ More

    Submitted 4 October, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Journal ref: Nature Medicine (2024)

  9. arXiv:2401.12032  [pdf, other

    cs.HC cs.AI

    MINT: A wrapper to make multi-modal and multi-image AI models interactive

    Authors: Jan Freyberg, Abhijit Guha Roy, Terry Spitz, Beverly Freeman, Mike Schaekermann, Patricia Strachan, Eva Schnider, Renee Wong, Dale R Webster, Alan Karthikesalingam, Yun Liu, Krishnamurthy Dvijotham, Umesh Telang

    Abstract: During the diagnostic process, doctors incorporate multimodal information including imaging and the medical history - and similarly medical AI development has increasingly become multimodal. In this paper we tackle a more subtle challenge: doctors take a targeted medical history to obtain only the most pertinent pieces of information; how do we enable AI to do the same? We develop a wrapper method… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 15 pages, 7 figures

  10. arXiv:2401.05654  [pdf, other

    cs.AI cs.CL cs.LG

    Towards Conversational Diagnostic AI

    Authors: Tao Tu, Anil Palepu, Mike Schaekermann, Khaled Saab, Jan Freyberg, Ryutaro Tanno, Amy Wang, Brenna Li, Mohamed Amin, Nenad Tomasev, Shekoofeh Azizi, Karan Singhal, Yong Cheng, Le Hou, Albert Webson, Kavita Kulkarni, S Sara Mahdavi, Christopher Semturs, Juraj Gottweis, Joelle Barral, Katherine Chou, Greg S Corrado, Yossi Matias, Alan Karthikesalingam, Vivek Natarajan

    Abstract: At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introdu… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 46 pages, 5 figures in main text, 19 figures in appendix

  11. arXiv:2312.00164  [pdf, other

    cs.CY cs.AI

    Towards Accurate Differential Diagnosis with Large Language Models

    Authors: Daniel McDuff, Mike Schaekermann, Tao Tu, Anil Palepu, Amy Wang, Jake Garrison, Karan Singhal, Yash Sharma, Shekoofeh Azizi, Kavita Kulkarni, Le Hou, Yong Cheng, Yun Liu, S Sara Mahdavi, Sushant Prakash, Anupam Pathak, Christopher Semturs, Shwetak Patel, Dale R Webster, Ewa Dominowska, Juraj Gottweis, Joelle Barral, Katherine Chou, Greg S Corrado, Yossi Matias , et al. (3 additional authors not shown)

    Abstract: An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM op… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  12. arXiv:2311.18260  [pdf, other

    eess.IV cs.CL cs.CV cs.LG

    Consensus, dissensus and synergy between clinicians and specialist foundation models in radiology report generation

    Authors: Ryutaro Tanno, David G. T. Barrett, Andrew Sellergren, Sumedh Ghaisas, Sumanth Dathathri, Abigail See, Johannes Welbl, Karan Singhal, Shekoofeh Azizi, Tao Tu, Mike Schaekermann, Rhys May, Roy Lee, SiWai Man, Zahra Ahmed, Sara Mahdavi, Yossi Matias, Joelle Barral, Ali Eslami, Danielle Belgrave, Vivek Natarajan, Shravya Shetty, Pushmeet Kohli, Po-Sen Huang, Alan Karthikesalingam , et al. (1 additional authors not shown)

    Abstract: Radiology reports are an instrumental part of modern medicine, informing key clinical decisions such as diagnosis and treatment. The worldwide shortage of radiologists, however, restricts access to expert care and imposes heavy workloads, contributing to avoidable errors and delays in report delivery. While recent progress in automated report generation with vision-language models offer clear pote… ▽ More

    Submitted 20 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

  13. arXiv:2307.14334  [pdf, other

    cs.CL cs.CV

    Towards Generalist Biomedical AI

    Authors: Tao Tu, Shekoofeh Azizi, Danny Driess, Mike Schaekermann, Mohamed Amin, Pi-Chuan Chang, Andrew Carroll, Chuck Lau, Ryutaro Tanno, Ira Ktena, Basil Mustafa, Aakanksha Chowdhery, Yun Liu, Simon Kornblith, David Fleet, Philip Mansfield, Sushant Prakash, Renee Wong, Sunny Virmani, Christopher Semturs, S Sara Mahdavi, Bradley Green, Ewa Dominowska, Blaise Aguera y Arcas, Joelle Barral , et al. (7 additional authors not shown)

    Abstract: Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, and interpret this data at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To enable the development of these models, we first curate MultiMedBench… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

  14. arXiv:2307.02191  [pdf, other

    cs.LG cs.CV stat.ME stat.ML

    Evaluating AI systems under uncertain ground truth: a case study in dermatology

    Authors: David Stutz, Ali Taylan Cemgil, Abhijit Guha Roy, Tatiana Matejovicova, Melih Barsbey, Patricia Strachan, Mike Schaekermann, Jan Freyberg, Rajeev Rikhye, Beverly Freeman, Javier Perez Matos, Umesh Telang, Dale R. Webster, Yuan Liu, Greg S. Corrado, Yossi Matias, Pushmeet Kohli, Yun Liu, Arnaud Doucet, Alan Karthikesalingam

    Abstract: For safety, medical AI systems undergo thorough evaluations before deployment, validating their predictions against a ground truth which is assumed to be fixed and certain. However, this ground truth is often curated in the form of differential diagnoses. While a single differential diagnosis reflects the uncertainty in one expert assessment, multiple experts introduce another layer of uncertainty… ▽ More

    Submitted 13 April, 2025; v1 submitted 5 July, 2023; originally announced July 2023.

  15. arXiv:2305.09617  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Expert-Level Medical Question Answering with Large Language Models

    Authors: Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Le Hou, Kevin Clark, Stephen Pfohl, Heather Cole-Lewis, Darlene Neal, Mike Schaekermann, Amy Wang, Mohamed Amin, Sami Lachgar, Philip Mansfield, Sushant Prakash, Bradley Green, Ewa Dominowska, Blaise Aguera y Arcas, Nenad Tomasev, Yun Liu, Renee Wong, Christopher Semturs, S. Sara Mahdavi, Joelle Barral , et al. (6 additional authors not shown)

    Abstract: Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge. Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM w… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

  16. arXiv:2112.02255  [pdf, other

    cs.HC cs.AI

    In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers

    Authors: Vivek Krishna Pradhan, Mike Schaekermann, Matthew Lease

    Abstract: We propose a novel three-stage FIND-RESOLVE-LABEL workflow for crowdsourced annotation to reduce ambiguity in task instructions and thus improve annotation quality. Stage 1 (FIND) asks the crowd to find examples whose correct label seems ambiguous given task instructions. Workers are also asked to provide a short tag which describes the ambiguous concept embodied by the specific instance found. We… ▽ More

    Submitted 4 December, 2021; originally announced December 2021.

    Comments: 22 pages, 5 figures, 4 tables

  17. arXiv:2111.14322  [pdf, other

    cs.HC

    Proceedings of the CSCW 2021 Workshop -- Investigating and Mitigating Biases in Crowdsourced Data

    Authors: Danula Hettiachchi, Mark Sanderson, Jorge Goncalves, Simo Hosio, Gabriella Kazai, Matthew Lease, Mike Schaekermann, Emine Yilmaz

    Abstract: This volume contains the position papers presented at CSCW 2021 Workshop - Investigating and Mitigating Biases in Crowdsourced Data, held online on 23rd October 2021, at the 24th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2021). The workshop explored how specific crowdsourcing workflows, worker attributes, and work practices contribute to biases in data. The w… ▽ More

    Submitted 28 November, 2021; originally announced November 2021.

    Comments: 30 pages, More details available in the workshop website at https://sites.google.com/view/biases-in-crowdsourced-data

  18. arXiv:2111.10391  [pdf

    cs.LG cs.AI

    Data Excellence for AI: Why Should You Care

    Authors: Lora Aroyo, Matthew Lease, Praveen Paritosh, Mike Schaekermann

    Abstract: The efficacy of machine learning (ML) models depends on both algorithms and data. Training data defines what we want our models to learn, and testing data provides the means by which their empirical progress is measured. Benchmark datasets define the entire world within which models exist and operate, yet research continues to focus on critiquing and improving the algorithmic aspect of the models… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

    Comments: To appear in ACM Interactions, 29(2) March-April, 2022. 4 pages

    Journal ref: ACM Interactions, 29(2) March-April, 2022

  19. The Challenge of Variable Effort Crowdsourcing and How Visible Gold Can Help

    Authors: Danula Hettiachchi, Mike Schaekermann, Tristan McKinney, Matthew Lease

    Abstract: We consider a class of variable effort human annotation tasks in which the number of labels required per item can greatly vary (e.g., finding all faces in an image, named entities in a text, bird calls in an audio recording, etc.). In such tasks, some items require far more effort than others to annotate. Furthermore, the per-item annotation effort is not known until after each item is annotated s… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: 25 pages, To appear in the Proceedings of the ACM on Human-Computer Interaction, CSCW 2021

    Journal ref: Proc. ACM Hum.-Comput. Interact., 5(CSCW2), 26 (2021)

  20. Deep Learning and Glaucoma Specialists: The Relative Importance of Optic Disc Features to Predict Glaucoma Referral in Fundus Photos

    Authors: Sonia Phene, R. Carter Dunn, Naama Hammel, Yun Liu, Jonathan Krause, Naho Kitade, Mike Schaekermann, Rory Sayres, Derek J. Wu, Ashish Bora, Christopher Semturs, Anita Misra, Abigail E. Huang, Arielle Spitze, Felipe A. Medeiros, April Y. Maa, Monica Gandhi, Greg S. Corrado, Lily Peng, Dale R. Webster

    Abstract: Glaucoma is the leading cause of preventable, irreversible blindness world-wide. The disease can remain asymptomatic until severe, and an estimated 50%-90% of people with glaucoma remain undiagnosed. Glaucoma screening is recommended for early detection and treatment. A cost-effective tool to detect glaucoma could expand screening access to a much larger patient population, but such a tool is curr… ▽ More

    Submitted 30 August, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

    Journal ref: Ophthalmology (2019)