-
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset
Authors:
Tobi Olatunji,
Charles Nimo,
Abraham Owodunni,
Tassallah Abdullahi,
Emmanuel Ayodele,
Mardhiyah Sanni,
Chinemelu Aka,
Folafunmi Omofoye,
Foutse Yuehgoh,
Timothy Faniran,
Bonaventure F. P. Dossou,
Moshood Yekini,
Jonas Kemp,
Katherine Heller,
Jude Chidubem Omeke,
Chidi Asuzu MD,
Naome A. Etori,
Aimérou Ndiaye,
Ifeoma Okoh,
Evans Doe Ocansey,
Wendy Kinara,
Michael Best,
Irfan Essa,
Stephen Edward Moore,
Chris Fourie
, et al. (1 additional authors not shown)
Abstract:
Recent advancements in large language model(LLM) performance on medical multiple choice question (MCQ) benchmarks have stimulated interest from healthcare providers and patients globally. Particularly in low-and middle-income countries (LMICs) facing acute physician shortages and lack of specialists, LLMs offer a potentially scalable pathway to enhance healthcare access and reduce costs. However,…
▽ More
Recent advancements in large language model(LLM) performance on medical multiple choice question (MCQ) benchmarks have stimulated interest from healthcare providers and patients globally. Particularly in low-and middle-income countries (LMICs) facing acute physician shortages and lack of specialists, LLMs offer a potentially scalable pathway to enhance healthcare access and reduce costs. However, their effectiveness in the Global South, especially across the African continent, remains to be established. In this work, we introduce AfriMed-QA, the first large scale Pan-African English multi-specialty medical Question-Answering (QA) dataset, 15,000 questions (open and closed-ended) sourced from over 60 medical schools across 16 countries, covering 32 medical specialties. We further evaluate 30 LLMs across multiple axes including correctness and demographic bias. Our findings show significant performance variation across specialties and geographies, MCQ performance clearly lags USMLE (MedQA). We find that biomedical LLMs underperform general models and smaller edge-friendly LLMs struggle to achieve a passing score. Interestingly, human evaluations show a consistent consumer preference for LLM answers and explanations when compared with clinician answers.
△ Less
Submitted 14 January, 2025; v1 submitted 23 November, 2024;
originally announced November 2024.
-
Nteasee: Understanding Needs in AI for Health in Africa -- A Mixed-Methods Study of Expert and General Population Perspectives
Authors:
Mercy Nyamewaa Asiedu,
Iskandar Haykel,
Awa Dieng,
Kerrie Kauer,
Tousif Ahmed,
Florence Ofori,
Charisma Chan,
Stephen Pfohl,
Negar Rostamzadeh,
Katherine Heller
Abstract:
Artificial Intelligence (AI) for health has the potential to significantly change and improve healthcare. However in most African countries, identifying culturally and contextually attuned approaches for deploying these solutions is not well understood. To bridge this gap, we conduct a qualitative study to investigate the best practices, fairness indicators, and potential biases to mitigate when d…
▽ More
Artificial Intelligence (AI) for health has the potential to significantly change and improve healthcare. However in most African countries, identifying culturally and contextually attuned approaches for deploying these solutions is not well understood. To bridge this gap, we conduct a qualitative study to investigate the best practices, fairness indicators, and potential biases to mitigate when deploying AI for health in African countries, as well as explore opportunities where artificial intelligence could make a positive impact in health. We used a mixed methods approach combining in-depth interviews (IDIs) and surveys. We conduct 1.5-2 hour long IDIs with 50 experts in health, policy, and AI across 17 countries, and through an inductive approach we conduct a qualitative thematic analysis on expert IDI responses. We administer a blinded 30-minute survey with case studies to 672 general population participants across 5 countries in Africa and analyze responses on quantitative scales, statistically comparing responses by country, age, gender, and level of familiarity with AI. We thematically summarize open-ended responses from surveys. Our results find generally positive attitudes, high levels of trust, accompanied by moderate levels of concern among general population participants for AI usage for health in Africa. This contrasts with expert responses, where major themes revolved around trust/mistrust, ethical concerns, and systemic barriers to integration, among others. This work presents the first-of-its-kind qualitative research study of the potential of AI for health in Africa from an algorithmic fairness angle, with perspectives from both experts and the general population. We hope that this work guides policymakers and drives home the need for further research and the inclusion of general population perspectives in decision-making around AI usage.
△ Less
Submitted 25 May, 2025; v1 submitted 4 September, 2024;
originally announced September 2024.
-
Machine Learning for Health symposium 2023 -- Findings track
Authors:
Stefan Hegselmann,
Antonio Parziale,
Divya Shanmugam,
Shengpu Tang,
Mercy Nyamewaa Asiedu,
Serina Chang,
Thomas Hartvigsen,
Harvineet Singh
Abstract:
A collection of the accepted Findings papers that were presented at the 3rd Machine Learning for Health symposium (ML4H 2023), which was held on December 10, 2023, in New Orleans, Louisiana, USA. ML4H 2023 invited high-quality submissions on relevant problems in a variety of health-related disciplines including healthcare, biomedicine, and public health. Two submission tracks were offered: the arc…
▽ More
A collection of the accepted Findings papers that were presented at the 3rd Machine Learning for Health symposium (ML4H 2023), which was held on December 10, 2023, in New Orleans, Louisiana, USA. ML4H 2023 invited high-quality submissions on relevant problems in a variety of health-related disciplines including healthcare, biomedicine, and public health. Two submission tracks were offered: the archival Proceedings track, and the non-archival Findings track. Proceedings were targeted at mature work with strong technical sophistication and a high impact to health. The Findings track looked for new ideas that could spark insightful discussion, serve as valuable resources for the community, or could enable new collaborations. Submissions to the Proceedings track, if not accepted, were automatically considered for the Findings track. All the manuscripts submitted to ML4H Symposium underwent a double-blind peer-review process.
△ Less
Submitted 15 December, 2023; v1 submitted 1 December, 2023;
originally announced December 2023.
-
Globalizing Fairness Attributes in Machine Learning: A Case Study on Health in Africa
Authors:
Mercy Nyamewaa Asiedu,
Awa Dieng,
Abigail Oppong,
Maria Nagawa,
Sanmi Koyejo,
Katherine Heller
Abstract:
With growing machine learning (ML) applications in healthcare, there have been calls for fairness in ML to understand and mitigate ethical concerns these systems may pose. Fairness has implications for global health in Africa, which already has inequitable power imbalances between the Global North and South. This paper seeks to explore fairness for global health, with Africa as a case study. We pr…
▽ More
With growing machine learning (ML) applications in healthcare, there have been calls for fairness in ML to understand and mitigate ethical concerns these systems may pose. Fairness has implications for global health in Africa, which already has inequitable power imbalances between the Global North and South. This paper seeks to explore fairness for global health, with Africa as a case study. We propose fairness attributes for consideration in the African context and delineate where they may come into play in different ML-enabled medical modalities. This work serves as a basis and call for action for furthering research into fairness in global health.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.