-
In the Picture: Medical Imaging Datasets, Artifacts, and their Living Review
Authors:
Amelia Jiménez-Sánchez,
Natalia-Rozalia Avlona,
Sarah de Boer,
Víctor M. Campello,
Aasa Feragen,
Enzo Ferrante,
Melanie Ganz,
Judy Wawira Gichoya,
Camila González,
Steff Groefsema,
Alessa Hering,
Adam Hulman,
Leo Joskowicz,
Dovile Juodelyte,
Melih Kandemir,
Thijs Kooi,
Jorge del Pozo Lérida,
Livie Yumeng Li,
Andre Pacheco,
Tim Rädsch,
Mauricio Reyes,
Théo Sourget,
Bram van Ginneken,
David Wen,
Nina Weng
, et al. (4 additional authors not shown)
Abstract:
Datasets play a critical role in medical imaging research, yet issues such as label quality, shortcuts, and metadata are often overlooked. This lack of attention may harm the generalizability of algorithms and, consequently, negatively impact patient outcomes. While existing medical imaging literature reviews mostly focus on machine learning (ML) methods, with only a few focusing on datasets for s…
▽ More
Datasets play a critical role in medical imaging research, yet issues such as label quality, shortcuts, and metadata are often overlooked. This lack of attention may harm the generalizability of algorithms and, consequently, negatively impact patient outcomes. While existing medical imaging literature reviews mostly focus on machine learning (ML) methods, with only a few focusing on datasets for specific applications, these reviews remain static -- they are published once and not updated thereafter. This fails to account for emerging evidence, such as biases, shortcuts, and additional annotations that other researchers may contribute after the dataset is published. We refer to these newly discovered findings of datasets as research artifacts. To address this gap, we propose a living review that continuously tracks public datasets and their associated research artifacts across multiple medical imaging applications. Our approach includes a framework for the living review to monitor data documentation artifacts, and an SQL database to visualize the citation relationships between research artifact and dataset. Lastly, we discuss key considerations for creating medical imaging datasets, review best practices for data annotation, discuss the significance of shortcuts and demographic diversity, and emphasize the importance of managing datasets throughout their entire lifecycle. Our demo is publicly available at http://inthepicture.itu.dk/.
△ Less
Submitted 2 June, 2025; v1 submitted 18 January, 2025;
originally announced January 2025.
-
"It depends": Configuring AI to Improve Clinical Usefulness Across Contexts
Authors:
Hubert D. Zając,
Jorge M. N. Ribeiro,
Silvia Ingala,
Simona Gentile,
Ruth Wanjohi,
Samuel N. Gitau,
Jonathan F. Carlsen,
Michael B. Nielsen,
Tariq O. Andersen
Abstract:
Artificial Intelligence (AI) repeatedly match or outperform radiologists in lab experiments. However, real-world implementations of radiological AI-based systems are found to provide little to no clinical value. This paper explores how to design AI for clinical usefulness in different contexts. We conducted 19 design sessions and design interventions with 13 radiologists from 7 clinical sites in D…
▽ More
Artificial Intelligence (AI) repeatedly match or outperform radiologists in lab experiments. However, real-world implementations of radiological AI-based systems are found to provide little to no clinical value. This paper explores how to design AI for clinical usefulness in different contexts. We conducted 19 design sessions and design interventions with 13 radiologists from 7 clinical sites in Denmark and Kenya, based on three iterations of a functional AI-based prototype. Ten sociotechnical dependencies were identified as crucial for the design of AI in radiology. We conceptualised four technical dimensions that must be configured to the intended clinical context of use: AI functionality, AI medical focus, AI decision threshold, and AI Explainability. We present four design recommendations on how to address dependencies pertaining to the medical knowledge, clinic type, user expertise level, patient context, and user situation that condition the configuration of these technical dimensions.
△ Less
Submitted 27 May, 2024;
originally announced July 2024.
-
Copycats: the many lives of a publicly available medical imaging dataset
Authors:
Amelia Jiménez-Sánchez,
Natalia-Rozalia Avlona,
Dovile Juodelyte,
Théo Sourget,
Caroline Vang-Larsen,
Anna Rogers,
Hubert Dariusz Zając,
Veronika Cheplygina
Abstract:
Medical Imaging (MI) datasets are fundamental to artificial intelligence in healthcare. The accuracy, robustness, and fairness of diagnostic algorithms depend on the data (and its quality) used to train and evaluate the models. MI datasets used to be proprietary, but have become increasingly available to the public, including on community-contributed platforms (CCPs) like Kaggle or HuggingFace. Wh…
▽ More
Medical Imaging (MI) datasets are fundamental to artificial intelligence in healthcare. The accuracy, robustness, and fairness of diagnostic algorithms depend on the data (and its quality) used to train and evaluate the models. MI datasets used to be proprietary, but have become increasingly available to the public, including on community-contributed platforms (CCPs) like Kaggle or HuggingFace. While open data is important to enhance the redistribution of data's public value, we find that the current CCP governance model fails to uphold the quality needed and recommended practices for sharing, documenting, and evaluating datasets. In this paper, we conduct an analysis of publicly available machine learning datasets on CCPs, discussing datasets' context, and identifying limitations and gaps in the current CCP landscape. We highlight differences between MI and computer vision datasets, particularly in the potentially harmful downstream effects from poor adoption of recommended dataset management practices. We compare the analyzed datasets across several dimensions, including data sharing, data documentation, and maintenance. We find vague licenses, lack of persistent identifiers and storage, duplicates, and missing metadata, with differences between the platforms. Our research contributes to efforts in responsible data curation and AI algorithms for healthcare.
△ Less
Submitted 30 October, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
Ground Truth Or Dare: Factors Affecting The Creation Of Medical Datasets For Training AI
Authors:
Hubert D. Zając,
Natalia R. Avlona,
Tariq O. Andersen,
Finn Kensing,
Irina Shklovski
Abstract:
One of the core goals of responsible AI development is ensuring high-quality training datasets. Many researchers have pointed to the importance of the annotation step in the creation of high-quality data, but less attention has been paid to the work that enables data annotation. We define this work as the design of ground truth schema and explore the challenges involved in the creation of datasets…
▽ More
One of the core goals of responsible AI development is ensuring high-quality training datasets. Many researchers have pointed to the importance of the annotation step in the creation of high-quality data, but less attention has been paid to the work that enables data annotation. We define this work as the design of ground truth schema and explore the challenges involved in the creation of datasets in the medical domain even before any annotations are made. Based on extensive work in three health-tech organisations, we describe five external and internal factors that condition medical dataset creation processes. Three external factors include regulatory constraints, the context of creation and use, and commercial and operational pressures. These factors condition medical data collection and shape the ground truth schema design. Two internal factors include epistemic differences and limits of labelling. These directly shape the design of the ground truth schema. Discussions of what constitutes high-quality data need to pay attention to the factors that shape and constrain what is possible to be created, to ensure responsible AI design.
△ Less
Submitted 12 August, 2023;
originally announced September 2023.