-
MAIRA-2: Grounded Radiology Report Generation
Authors:
Shruthi Bannur,
Kenza Bouzid,
Daniel C. Castro,
Anton Schwaighofer,
Anja Thieme,
Sam Bond-Taylor,
Maximilian Ilse,
Fernando Pérez-García,
Valentina Salvatelli,
Harshita Sharma,
Felix Meissen,
Mercy Ranjit,
Shaury Srivastav,
Julia Gong,
Noel C. F. Codella,
Fabian Falck,
Ozan Oktay,
Matthew P. Lungren,
Maria Teodora Wetscherek,
Javier Alvarez-Valle,
Stephanie L. Hyland
Abstract:
Radiology reporting is a complex task requiring detailed medical image understanding and precise language generation, for which generative multimodal models offer a promising solution. However, to impact clinical practice, models must achieve a high level of both verifiable performance and utility. We augment the utility of automated report generation by incorporating localisation of individual fi…
▽ More
Radiology reporting is a complex task requiring detailed medical image understanding and precise language generation, for which generative multimodal models offer a promising solution. However, to impact clinical practice, models must achieve a high level of both verifiable performance and utility. We augment the utility of automated report generation by incorporating localisation of individual findings on the image - a task we call grounded report generation - and enhance performance by incorporating realistic reporting context as inputs. We design a novel evaluation framework (RadFact) leveraging the logical inference capabilities of large language models (LLMs) to quantify report correctness and completeness at the level of individual sentences, while supporting the new task of grounded reporting. We develop MAIRA-2, a large radiology-specific multimodal model designed to generate chest X-ray reports with and without grounding. MAIRA-2 achieves state of the art on existing report generation benchmarks and establishes the novel task of grounded report generation.
△ Less
Submitted 20 September, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Challenges for Responsible AI Design and Workflow Integration in Healthcare: A Case Study of Automatic Feeding Tube Qualification in Radiology
Authors:
Anja Thieme,
Abhijith Rajamohan,
Benjamin Cooper,
Heather Groombridge,
Robert Simister,
Barney Wong,
Nicholas Woznitza,
Mark Ames Pinnock,
Maria Teodora Wetscherek,
Cecily Morrison,
Hannah Richardson,
Fernando Pérez-García,
Stephanie L. Hyland,
Shruthi Bannur,
Daniel C. Castro,
Kenza Bouzid,
Anton Schwaighofer,
Mercy Ranjit,
Harshita Sharma,
Matthew P. Lungren,
Ozan Oktay,
Javier Alvarez-Valle,
Aditya Nori,
Stephen Harris,
Joseph Jacob
Abstract:
Nasogastric tubes (NGTs) are feeding tubes that are inserted through the nose into the stomach to deliver nutrition or medication. If not placed correctly, they can cause serious harm, even death to patients. Recent AI developments demonstrate the feasibility of robustly detecting NGT placement from Chest X-ray images to reduce risks of sub-optimally or critically placed NGTs being missed or delay…
▽ More
Nasogastric tubes (NGTs) are feeding tubes that are inserted through the nose into the stomach to deliver nutrition or medication. If not placed correctly, they can cause serious harm, even death to patients. Recent AI developments demonstrate the feasibility of robustly detecting NGT placement from Chest X-ray images to reduce risks of sub-optimally or critically placed NGTs being missed or delayed in their detection, but gaps remain in clinical practice integration. In this study, we present a human-centered approach to the problem and describe insights derived following contextual inquiry and in-depth interviews with 15 clinical stakeholders. The interviews helped understand challenges in existing workflows, and how best to align technical capabilities with user needs and expectations. We discovered the trade-offs and complexities that need consideration when choosing suitable workflow stages, target users, and design configurations for different AI proposals. We explored how to balance AI benefits and risks for healthcare staff and patients within broader organizational and medical-legal constraints. We also identified data issues related to edge cases and data biases that affect model training and evaluation; how data documentation practices influence data preparation and labelling; and how to measure relevant AI outcomes reliably in future evaluations. We discuss how our work informs design and development of AI applications that are clinically useful, ethical, and acceptable in real-world healthcare services.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Multimodal Healthcare AI: Identifying and Designing Clinically Relevant Vision-Language Applications for Radiology
Authors:
Nur Yildirim,
Hannah Richardson,
Maria T. Wetscherek,
Junaid Bajwa,
Joseph Jacob,
Mark A. Pinnock,
Stephen Harris,
Daniel Coelho de Castro,
Shruthi Bannur,
Stephanie L. Hyland,
Pratik Ghosh,
Mercy Ranjit,
Kenza Bouzid,
Anton Schwaighofer,
Fernando Pérez-García,
Harshita Sharma,
Ozan Oktay,
Matthew Lungren,
Javier Alvarez-Valle,
Aditya Nori,
Anja Thieme
Abstract:
Recent advances in AI combine large language models (LLMs) with vision encoders that bring forward unprecedented technical capabilities to leverage for a wide range of healthcare applications. Focusing on the domain of radiology, vision-language models (VLMs) achieve good performance results for tasks such as generating radiology findings based on a patient's medical image, or answering visual que…
▽ More
Recent advances in AI combine large language models (LLMs) with vision encoders that bring forward unprecedented technical capabilities to leverage for a wide range of healthcare applications. Focusing on the domain of radiology, vision-language models (VLMs) achieve good performance results for tasks such as generating radiology findings based on a patient's medical image, or answering visual questions (e.g., 'Where are the nodules in this chest X-ray?'). However, the clinical utility of potential applications of these capabilities is currently underexplored. We engaged in an iterative, multidisciplinary design process to envision clinically relevant VLM interactions, and co-designed four VLM use concepts: Draft Report Generation, Augmented Report Review, Visual Search and Querying, and Patient Imaging History Highlights. We studied these concepts with 13 radiologists and clinicians who assessed the VLM concepts as valuable, yet articulated many design considerations. Reflecting on our findings, we discuss implications for integrating VLM capabilities in radiology, and for healthcare AI more generally.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Exploring scalable medical image encoders beyond text supervision
Authors:
Fernando Pérez-García,
Harshita Sharma,
Sam Bond-Taylor,
Kenza Bouzid,
Valentina Salvatelli,
Maximilian Ilse,
Shruthi Bannur,
Daniel C. Castro,
Anton Schwaighofer,
Matthew P. Lungren,
Maria Teodora Wetscherek,
Noel Codella,
Stephanie L. Hyland,
Javier Alvarez-Valle,
Ozan Oktay
Abstract:
Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images, serving as a foundational element in multimodal systems within the computer vision and medical imaging domains. However, the computed features are limited by the information contained in the text, which is particularly problematic in medical imaging, where the findings d…
▽ More
Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images, serving as a foundational element in multimodal systems within the computer vision and medical imaging domains. However, the computed features are limited by the information contained in the text, which is particularly problematic in medical imaging, where the findings described by radiologists focus on specific observations. This challenge is compounded by the scarcity of paired imaging-text data due to concerns over leakage of personal health information. In this work, we fundamentally challenge the prevailing reliance on language supervision for learning general-purpose biomedical imaging encoders. We introduce RAD-DINO, a biomedical image encoder pre-trained solely on unimodal biomedical imaging data that obtains similar or greater performance than state-of-the-art biomedical language-supervised models on a diverse range of benchmarks. Specifically, the quality of learned representations is evaluated on standard imaging tasks (classification and semantic segmentation), and a vision-language alignment task (text report generation from images). To further demonstrate the drawback of language supervision, we show that features from RAD-DINO correlate with other medical records (e.g., sex or age) better than language-supervised models, which are generally not mentioned in radiology reports. Finally, we conduct a series of ablations determining the factors in RAD-DINO's performance; notably, we observe that RAD-DINO's downstream performance scales well with the quantity and diversity of training data, demonstrating that image-only supervision is a scalable approach for training a foundational biomedical image encoder. Model weights of RAD-DINO trained on publicly available datasets are available at https://huggingface.co/microsoft/rad-dino.
△ Less
Submitted 7 February, 2025; v1 submitted 19 January, 2024;
originally announced January 2024.
-
RadEdit: stress-testing biomedical vision models via diffusion image editing
Authors:
Fernando Pérez-García,
Sam Bond-Taylor,
Pedro P. Sanchez,
Boris van Breugel,
Daniel C. Castro,
Harshita Sharma,
Valentina Salvatelli,
Maria T. A. Wetscherek,
Hannah Richardson,
Matthew P. Lungren,
Aditya Nori,
Javier Alvarez-Valle,
Ozan Oktay,
Maximilian Ilse
Abstract:
Biomedical imaging datasets are often small and biased, meaning that real-world performance of predictive models can be substantially lower than expected from internal testing. This work proposes using generative image editing to simulate dataset shifts and diagnose failure modes of biomedical vision models; this can be used in advance of deployment to assess readiness, potentially reducing cost a…
▽ More
Biomedical imaging datasets are often small and biased, meaning that real-world performance of predictive models can be substantially lower than expected from internal testing. This work proposes using generative image editing to simulate dataset shifts and diagnose failure modes of biomedical vision models; this can be used in advance of deployment to assess readiness, potentially reducing cost and patient harm. Existing editing methods can produce undesirable changes, with spurious correlations learned due to the co-occurrence of disease and treatment interventions, limiting practical applicability. To address this, we train a text-to-image diffusion model on multiple chest X-ray datasets and introduce a new editing method RadEdit that uses multiple masks, if present, to constrain changes and ensure consistency in the edited images. We consider three types of dataset shifts: acquisition shift, manifestation shift, and population shift, and demonstrate that our approach can diagnose failures and quantify model robustness without additional data collection, complementing more qualitative tools for explainable AI.
△ Less
Submitted 3 April, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
MAIRA-1: A specialised large multimodal model for radiology report generation
Authors:
Stephanie L. Hyland,
Shruthi Bannur,
Kenza Bouzid,
Daniel C. Castro,
Mercy Ranjit,
Anton Schwaighofer,
Fernando Pérez-García,
Valentina Salvatelli,
Shaury Srivastav,
Anja Thieme,
Noel Codella,
Matthew P. Lungren,
Maria Teodora Wetscherek,
Ozan Oktay,
Javier Alvarez-Valle
Abstract:
We present a radiology-specific multimodal model for the task for generating radiological reports from chest X-rays (CXRs). Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders. On natural images, this has been shown to allow multimodal models to gain image understanding and description capabilities…
▽ More
We present a radiology-specific multimodal model for the task for generating radiological reports from chest X-rays (CXRs). Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders. On natural images, this has been shown to allow multimodal models to gain image understanding and description capabilities. Our proposed model (MAIRA-1) leverages a CXR-specific image encoder in conjunction with a fine-tuned large language model based on Vicuna-7B, and text-based data augmentation, to produce reports with state-of-the-art quality. In particular, MAIRA-1 significantly improves on the radiologist-aligned RadCliQ metric and across all lexical metrics considered. Manual review of model outputs demonstrates promising fluency and accuracy of generated reports while uncovering failure modes not captured by existing evaluation practices. More information and resources can be found on the project website: https://aka.ms/maira.
△ Less
Submitted 26 April, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
Exploring the Boundaries of GPT-4 in Radiology
Authors:
Qianchu Liu,
Stephanie Hyland,
Shruthi Bannur,
Kenza Bouzid,
Daniel C. Castro,
Maria Teodora Wetscherek,
Robert Tinn,
Harshita Sharma,
Fernando Pérez-García,
Anton Schwaighofer,
Pranav Rajpurkar,
Sameer Tajdin Khanna,
Hoifung Poon,
Naoto Usuyama,
Anja Thieme,
Aditya V. Nori,
Matthew P. Lungren,
Ozan Oktay,
Javier Alvarez-Valle
Abstract:
The recent success of general-domain large language models (LLMs) has significantly changed the natural language processing paradigm towards a unified foundation model across domains and applications. In this paper, we focus on assessing the performance of GPT-4, the most capable LLM so far, on the text-based applications for radiology reports, comparing against state-of-the-art (SOTA) radiology-s…
▽ More
The recent success of general-domain large language models (LLMs) has significantly changed the natural language processing paradigm towards a unified foundation model across domains and applications. In this paper, we focus on assessing the performance of GPT-4, the most capable LLM so far, on the text-based applications for radiology reports, comparing against state-of-the-art (SOTA) radiology-specific models. Exploring various prompting strategies, we evaluated GPT-4 on a diverse range of common radiology tasks and we found GPT-4 either outperforms or is on par with current SOTA radiology models. With zero-shot prompting, GPT-4 already obtains substantial gains ($\approx$ 10% absolute improvement) over radiology models in temporal sentence similarity classification (accuracy) and natural language inference ($F_1$). For tasks that require learning dataset-specific style or schema (e.g. findings summarisation), GPT-4 improves with example-based prompting and matches supervised SOTA. Our extensive error analysis with a board-certified radiologist shows GPT-4 has a sufficient level of radiology knowledge with only occasional errors in complex context that require nuanced domain knowledge. For findings summarisation, GPT-4 outputs are found to be overall comparable with existing manually-written impressions.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
No Fair Lunch: A Causal Perspective on Dataset Bias in Machine Learning for Medical Imaging
Authors:
Charles Jones,
Daniel C. Castro,
Fabio De Sousa Ribeiro,
Ozan Oktay,
Melissa McCradden,
Ben Glocker
Abstract:
As machine learning methods gain prominence within clinical decision-making, addressing fairness concerns becomes increasingly urgent. Despite considerable work dedicated to detecting and ameliorating algorithmic bias, today's methods are deficient with potentially harmful consequences. Our causal perspective sheds new light on algorithmic bias, highlighting how different sources of dataset bias m…
▽ More
As machine learning methods gain prominence within clinical decision-making, addressing fairness concerns becomes increasingly urgent. Despite considerable work dedicated to detecting and ameliorating algorithmic bias, today's methods are deficient with potentially harmful consequences. Our causal perspective sheds new light on algorithmic bias, highlighting how different sources of dataset bias may appear indistinguishable yet require substantially different mitigation strategies. We introduce three families of causal bias mechanisms stemming from disparities in prevalence, presentation, and annotation. Our causal analysis underscores how current mitigation methods tackle only a narrow and often unrealistic subset of scenarios. We provide a practical three-step framework for reasoning about fairness in medical imaging, supporting the development of safe and equitable AI prediction models.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Region-based Contrastive Pretraining for Medical Image Retrieval with Anatomic Query
Authors:
Ho Hin Lee,
Alberto Santamaria-Pang,
Jameson Merkow,
Ozan Oktay,
Fernando Pérez-García,
Javier Alvarez-Valle,
Ivan Tarapov
Abstract:
We introduce a novel Region-based contrastive pretraining for Medical Image Retrieval (RegionMIR) that demonstrates the feasibility of medical image retrieval with similar anatomical regions. RegionMIR addresses two major challenges for medical image retrieval i) standardization of clinically relevant searching criteria (e.g., anatomical, pathology-based), and ii) localization of anatomical area o…
▽ More
We introduce a novel Region-based contrastive pretraining for Medical Image Retrieval (RegionMIR) that demonstrates the feasibility of medical image retrieval with similar anatomical regions. RegionMIR addresses two major challenges for medical image retrieval i) standardization of clinically relevant searching criteria (e.g., anatomical, pathology-based), and ii) localization of anatomical area of interests that are semantically meaningful. In this work, we propose an ROI image retrieval image network that retrieves images with similar anatomy by extracting anatomical features (via bounding boxes) and evaluate similarity between pairwise anatomy-categorized features between the query and the database of images using contrastive learning. ROI queries are encoded using a contrastive-pretrained encoder that was fine-tuned for anatomy classification, which generates an anatomical-specific latent space for region-correlated image retrieval. During retrieval, we compare the anatomically encoded query to find similar features within a feature database generated from training samples, and retrieve images with similar regions from training samples. We evaluate our approach on both anatomy classification and image retrieval tasks using the Chest ImaGenome Dataset. Our proposed strategy yields an improvement over state-of-the-art pretraining and co-training strategies, from 92.24 to 94.12 (2.03%) classification accuracy in anatomies. We qualitatively evaluate the image retrieval performance demonstrating generalizability across multiple anatomies with different morphology.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Compositional Zero-Shot Domain Transfer with Text-to-Text Models
Authors:
Fangyu Liu,
Qianchu Liu,
Shruthi Bannur,
Fernando Pérez-García,
Naoto Usuyama,
Sheng Zhang,
Tristan Naumann,
Aditya Nori,
Hoifung Poon,
Javier Alvarez-Valle,
Ozan Oktay,
Stephanie L. Hyland
Abstract:
Label scarcity is a bottleneck for improving task performance in specialised domains. We propose a novel compositional transfer learning framework (DoT5 - domain compositional zero-shot T5) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from MLM of unlabelled in-domain free text) and task knowledge (from task training on more readily availa…
▽ More
Label scarcity is a bottleneck for improving task performance in specialised domains. We propose a novel compositional transfer learning framework (DoT5 - domain compositional zero-shot T5) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from MLM of unlabelled in-domain free text) and task knowledge (from task training on more readily available general-domain data) in a multi-task manner. To improve the transferability of task training, we design a strategy named NLGU: we simultaneously train NLG for in-domain label-to-data generation which enables data augmentation for self-finetuning and NLU for label prediction. We evaluate DoT5 on the biomedical domain and the resource-lean subdomain of radiology, focusing on NLI, text summarisation and embedding learning. DoT5 demonstrates the effectiveness of compositional transfer learning through multi-task learning. In particular, DoT5 outperforms the current SOTA in zero-shot transfer by over 7 absolute points in accuracy on RadNLI. We validate DoT5 with ablations and a case study demonstrating its ability to solve challenging NLI examples requiring in-domain expertise.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing
Authors:
Shruthi Bannur,
Stephanie Hyland,
Qianchu Liu,
Fernando Pérez-García,
Maximilian Ilse,
Daniel C. Castro,
Benedikt Boecking,
Harshita Sharma,
Kenza Bouzid,
Anja Thieme,
Anton Schwaighofer,
Maria Wetscherek,
Matthew P. Lungren,
Aditya Nori,
Javier Alvarez-Valle,
Ozan Oktay
Abstract:
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities. Prior work in biomedical VLP has mostly relied on the alignment of single image and report pairs even though clinical notes commonly refer to prior images. This does not only introduce poor alignment between the modalities but also a missed opportunity to exploit rich self-superv…
▽ More
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities. Prior work in biomedical VLP has mostly relied on the alignment of single image and report pairs even though clinical notes commonly refer to prior images. This does not only introduce poor alignment between the modalities but also a missed opportunity to exploit rich self-supervision through existing temporal content in the data. In this work, we explicitly account for prior images and reports when available during both training and fine-tuning. Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model. It is designed to be versatile to arising challenges such as pose variations and missing input images across time. The resulting model excels on downstream tasks both in single- and multi-image setups, achieving state-of-the-art performance on (I) progression classification, (II) phrase grounding, and (III) report generation, whilst offering consistent improvements on disease classification and sentence-similarity tasks. We release a novel multi-modal temporal benchmark dataset, MS-CXR-T, to quantify the quality of vision-language representations in terms of temporal semantics. Our experimental results show the advantages of incorporating prior images and reports to make most use of the data.
△ Less
Submitted 16 March, 2023; v1 submitted 11 January, 2023;
originally announced January 2023.
-
Making the Most of Text Semantics to Improve Biomedical Vision--Language Processing
Authors:
Benedikt Boecking,
Naoto Usuyama,
Shruthi Bannur,
Daniel C. Castro,
Anton Schwaighofer,
Stephanie Hyland,
Maria Wetscherek,
Tristan Naumann,
Aditya Nori,
Javier Alvarez-Valle,
Hoifung Poon,
Ozan Oktay
Abstract:
Multi-modal data abounds in biomedicine, such as radiology images and reports. Interpreting this data at scale is essential for improving clinical care and accelerating clinical research. Biomedical text with its complex semantics poses additional challenges in vision--language modelling compared to the general domain, and previous work has used insufficiently adapted models that lack domain-speci…
▽ More
Multi-modal data abounds in biomedicine, such as radiology images and reports. Interpreting this data at scale is essential for improving clinical care and accelerating clinical research. Biomedical text with its complex semantics poses additional challenges in vision--language modelling compared to the general domain, and previous work has used insufficiently adapted models that lack domain-specific language understanding. In this paper, we show that principled textual semantic modelling can substantially improve contrastive learning in self-supervised vision--language processing. We release a language model that achieves state-of-the-art results in radiology natural language inference through its improved vocabulary and novel language pretraining objective leveraging semantics and discourse characteristics in radiology reports. Further, we propose a self-supervised joint vision--language approach with a focus on better text modelling. It establishes new state of the art results on a wide range of publicly available benchmarks, in part by leveraging our new domain-specific language model. We release a new dataset with locally-aligned phrase grounding annotations by radiologists to facilitate the study of complex semantic modelling in biomedical vision--language processing. A broad evaluation, including on this new dataset, shows that our contrastive learning approach, aided by textual-semantic modelling, outperforms prior methods in segmentation tasks, despite only using a global-alignment objective.
△ Less
Submitted 21 July, 2022; v1 submitted 20 April, 2022;
originally announced April 2022.
-
Active label cleaning for improved dataset quality under resource constraints
Authors:
Melanie Bernhardt,
Daniel C. Castro,
Ryutaro Tanno,
Anton Schwaighofer,
Kerem C. Tezcan,
Miguel Monteiro,
Shruthi Bannur,
Matthew Lungren,
Aditya Nori,
Ben Glocker,
Javier Alvarez-Valle,
Ozan Oktay
Abstract:
Imperfections in data annotation, known as label noise, are detrimental to the training of machine learning models and have an often-overlooked confounding effect on the assessment of model performance. Nevertheless, employing experts to remove label noise by fully re-annotating large datasets is infeasible in resource-constrained settings, such as healthcare. This work advocates for a data-driven…
▽ More
Imperfections in data annotation, known as label noise, are detrimental to the training of machine learning models and have an often-overlooked confounding effect on the assessment of model performance. Nevertheless, employing experts to remove label noise by fully re-annotating large datasets is infeasible in resource-constrained settings, such as healthcare. This work advocates for a data-driven approach to prioritising samples for re-annotation - which we term "active label cleaning". We propose to rank instances according to estimated label correctness and labelling difficulty of each sample, and introduce a simulation framework to evaluate relabelling efficacy. Our experiments on natural images and on a new medical imaging benchmark show that cleaning noisy labels mitigates their negative impact on model training, evaluation, and selection. Crucially, the proposed active label cleaning enables correcting labels up to 4 times more effectively than typical random selection in realistic conditions, making better use of experts' valuable time for improving dataset quality.
△ Less
Submitted 10 February, 2022; v1 submitted 1 September, 2021;
originally announced September 2021.
-
Hierarchical Analysis of Visual COVID-19 Features from Chest Radiographs
Authors:
Shruthi Bannur,
Ozan Oktay,
Melanie Bernhardt,
Anton Schwaighofer,
Rajesh Jena,
Besmira Nushi,
Sharan Wadhwani,
Aditya Nori,
Kal Natarajan,
Shazad Ashraf,
Javier Alvarez-Valle,
Daniel C. Castro
Abstract:
Chest radiography has been a recommended procedure for patient triaging and resource management in intensive care units (ICUs) throughout the COVID-19 pandemic. The machine learning efforts to augment this workflow have been long challenged due to deficiencies in reporting, model evaluation, and failure mode analysis. To address some of those shortcomings, we model radiological features with a hum…
▽ More
Chest radiography has been a recommended procedure for patient triaging and resource management in intensive care units (ICUs) throughout the COVID-19 pandemic. The machine learning efforts to augment this workflow have been long challenged due to deficiencies in reporting, model evaluation, and failure mode analysis. To address some of those shortcomings, we model radiological features with a human-interpretable class hierarchy that aligns with the radiological decision process. Also, we propose the use of a data-driven error analysis methodology to uncover the blind spots of our model, providing further transparency on its clinical utility. For example, our experiments show that model failures highly correlate with ICU imaging conditions and with the inherent difficulty in distinguishing certain types of radiological features. Also, our hierarchical interpretation and analysis facilitates the comparison with respect to radiologists' findings and inter-variability, which in return helps us to better assess the clinical applicability of models.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
Image-and-Spatial Transformer Networks for Structure-Guided Image Registration
Authors:
Matthew C. H. Lee,
Ozan Oktay,
Andreas Schuh,
Michiel Schaap,
Ben Glocker
Abstract:
Image registration with deep neural networks has become an active field of research and exciting avenue for a long standing problem in medical imaging. The goal is to learn a complex function that maps the appearance of input image pairs to parameters of a spatial transformation in order to align corresponding anatomical structures. We argue and show that the current direct, non-iterative approach…
▽ More
Image registration with deep neural networks has become an active field of research and exciting avenue for a long standing problem in medical imaging. The goal is to learn a complex function that maps the appearance of input image pairs to parameters of a spatial transformation in order to align corresponding anatomical structures. We argue and show that the current direct, non-iterative approaches are sub-optimal, in particular if we seek accurate alignment of Structures-of-Interest (SoI). Information about SoI is often available at training time, for example, in form of segmentations or landmarks. We introduce a novel, generic framework, Image-and-Spatial Transformer Networks (ISTNs), to leverage SoI information allowing us to learn new image representations that are optimised for the downstream registration task. Thanks to these representations we can employ a test-specific, iterative refinement over the transformation parameters which yields highly accurate registration even with very limited training data. Performance is demonstrated on pairwise 3D brain registration and illustrative synthetic data.
△ Less
Submitted 22 July, 2019;
originally announced July 2019.
-
Explainable Anatomical Shape Analysis through Deep Hierarchical Generative Models
Authors:
Carlo Biffi,
Juan J. Cerrolaza,
Giacomo Tarroni,
Wenjia Bai,
Antonio de Marvao,
Ozan Oktay,
Christian Ledig,
Loic Le Folgoc,
Konstantinos Kamnitsas,
Georgia Doumou,
Jinming Duan,
Sanjay K. Prasad,
Stuart A. Cook,
Declan P. O'Regan,
Daniel Rueckert
Abstract:
Quantification of anatomical shape changes currently relies on scalar global indexes which are largely insensitive to regional or asymmetric modifications. Accurate assessment of pathology-driven anatomical remodeling is a crucial step for the diagnosis and treatment of many conditions. Deep learning approaches have recently achieved wide success in the analysis of medical images, but they lack in…
▽ More
Quantification of anatomical shape changes currently relies on scalar global indexes which are largely insensitive to regional or asymmetric modifications. Accurate assessment of pathology-driven anatomical remodeling is a crucial step for the diagnosis and treatment of many conditions. Deep learning approaches have recently achieved wide success in the analysis of medical images, but they lack interpretability in the feature extraction and decision processes. In this work, we propose a new interpretable deep learning model for shape analysis. In particular, we exploit deep generative networks to model a population of anatomical segmentations through a hierarchy of conditional latent variables. At the highest level of this hierarchy, a two-dimensional latent space is simultaneously optimised to discriminate distinct clinical conditions, enabling the direct visualisation of the classification space. Moreover, the anatomical variability encoded by this discriminative latent space can be visualised in the segmentation space thanks to the generative properties of the model, making the classification task transparent. This approach yielded high accuracy in the categorisation of healthy and remodelled left ventricles when tested on unseen segmentations from our own multi-centre dataset as well as in an external validation set, and on hippocampi from healthy controls and patients with Alzheimer's disease when tested on ADNI data. More importantly, it enabled the visualisation in three-dimensions of both global and regional anatomical features which better discriminate between the conditions under exam. The proposed approach scales effectively to large populations, facilitating high-throughput analysis of normal anatomy and pathology in large-scale studies of volumetric imaging.
△ Less
Submitted 4 January, 2020; v1 submitted 28 June, 2019;
originally announced July 2019.
-
Automated Quality Control in Image Segmentation: Application to the UK Biobank Cardiac MR Imaging Study
Authors:
Robert Robinson,
Vanya V. Valindria,
Wenjia Bai,
Ozan Oktay,
Bernhard Kainz,
Hideaki Suzuki,
Mihir M. Sanghvi,
Nay Aung,
Jos$é$ Miguel Paiva,
Filip Zemrak,
Kenneth Fung,
Elena Lukaschuk,
Aaron M. Lee,
Valentina Carapella,
Young Jin Kim,
Stefan K. Piechnik,
Stefan Neubauer,
Steffen E. Petersen,
Chris Page,
Paul M. Matthews,
Daniel Rueckert,
Ben Glocker
Abstract:
Background: The trend towards large-scale studies including population imaging poses new challenges in terms of quality control (QC). This is a particular issue when automatic processing tools, e.g. image segmentation methods, are employed to derive quantitative measures or biomarkers for later analyses. Manual inspection and visual QC of each segmentation isn't feasible at large scale. However, i…
▽ More
Background: The trend towards large-scale studies including population imaging poses new challenges in terms of quality control (QC). This is a particular issue when automatic processing tools, e.g. image segmentation methods, are employed to derive quantitative measures or biomarkers for later analyses. Manual inspection and visual QC of each segmentation isn't feasible at large scale. However, it's important to be able to automatically detect when a segmentation method fails so as to avoid inclusion of wrong measurements into subsequent analyses which could lead to incorrect conclusions. Methods: To overcome this challenge, we explore an approach for predicting segmentation quality based on Reverse Classification Accuracy, which enables us to discriminate between successful and failed segmentations on a per-cases basis. We validate this approach on a new, large-scale manually-annotated set of 4,800 cardiac magnetic resonance scans. We then apply our method to a large cohort of 7,250 cardiac MRI on which we have performed manual QC. Results: We report results used for predicting segmentation quality metrics including Dice Similarity Coefficient (DSC) and surface-distance measures. As initial validation, we present data for 400 scans demonstrating 99% accuracy for classifying low and high quality segmentations using predicted DSC scores. As further validation we show high correlation between real and predicted scores and 95% classification accuracy on 4,800 scans for which manual segmentations were available. We mimic real-world application of the method on 7,250 cardiac MRI where we show good agreement between predicted quality metrics and manual visual QC scores. Conclusions: We show that RCA has the potential for accurate and fully automatic segmentation QC on a per-case basis in the context of large-scale population imaging as in the UK Biobank Imaging Study.
△ Less
Submitted 27 January, 2019;
originally announced January 2019.
-
Weakly Supervised Estimation of Shadow Confidence Maps in Fetal Ultrasound Imaging
Authors:
Qingjie Meng,
Matthew Sinclair,
Veronika Zimmer,
Benjamin Hou,
Martin Rajchl,
Nicolas Toussaint,
Ozan Oktay,
Jo Schlemper,
Alberto Gomez,
James Housden,
Jacqueline Matthew,
Daniel Rueckert,
Julia Schnabel,
Bernhard Kainz
Abstract:
Detecting acoustic shadows in ultrasound images is important in many clinical and engineering applications. Real-time feedback of acoustic shadows can guide sonographers to a standardized diagnostic viewing plane with minimal artifacts and can provide additional information for other automatic image analysis algorithms. However, automatically detecting shadow regions using learning-based algorithm…
▽ More
Detecting acoustic shadows in ultrasound images is important in many clinical and engineering applications. Real-time feedback of acoustic shadows can guide sonographers to a standardized diagnostic viewing plane with minimal artifacts and can provide additional information for other automatic image analysis algorithms. However, automatically detecting shadow regions using learning-based algorithms is challenging because pixel-wise ground truth annotation of acoustic shadows is subjective and time consuming. In this paper we propose a weakly supervised method for automatic confidence estimation of acoustic shadow regions. Our method is able to generate a dense shadow-focused confidence map. In our method, a shadow-seg module is built to learn general shadow features for shadow segmentation, based on global image-level annotations as well as a small number of coarse pixel-wise shadow annotations. A transfer function is introduced to extend the obtained binary shadow segmentation to a reference confidence map. Additionally, a confidence estimation network is proposed to learn the mapping between input images and the reference confidence maps. This network is able to predict shadow confidence maps directly from input images during inference. We use evaluation metrics such as DICE, inter-class correlation and etc. to verify the effectiveness of our method. Our method is more consistent than human annotation, and outperforms the state-of-the-art quantitatively in shadow segmentation and qualitatively in confidence estimation of shadow regions. We further demonstrate the applicability of our method by integrating shadow confidence maps into tasks such as ultrasound image classification, multi-view image fusion and automated biometric measurements.
△ Less
Submitted 6 May, 2019; v1 submitted 20 November, 2018;
originally announced November 2018.
-
A Comprehensive Approach for Learning-based Fully-Automated Inter-slice Motion Correction for Short-Axis Cine Cardiac MR Image Stacks
Authors:
Giacomo Tarroni,
Ozan Oktay,
Matthew Sinclair,
Wenjia Bai,
Andreas Schuh,
Hideaki Suzuki,
Antonio de Marvao,
Declan O'Regan,
Stuart Cook,
Daniel Rueckert
Abstract:
In the clinical routine, short axis (SA) cine cardiac MR (CMR) image stacks are acquired during multiple subsequent breath-holds. If the patient cannot consistently hold the breath at the same position, the acquired image stack will be affected by inter-slice respiratory motion and will not correctly represent the cardiac volume, introducing potential errors in the following analyses and visualisa…
▽ More
In the clinical routine, short axis (SA) cine cardiac MR (CMR) image stacks are acquired during multiple subsequent breath-holds. If the patient cannot consistently hold the breath at the same position, the acquired image stack will be affected by inter-slice respiratory motion and will not correctly represent the cardiac volume, introducing potential errors in the following analyses and visualisations. We propose an approach to automatically correct inter-slice respiratory motion in SA CMR image stacks. Our approach makes use of probabilistic segmentation maps (PSMs) of the left ventricular (LV) cavity generated with decision forests. PSMs are generated for each slice of the SA stack and rigidly registered in-plane to a target PSM. If long axis (LA) images are available, PSMs are generated for them and combined to create the target PSM; if not, the target PSM is produced from the same stack using a 3D model trained from motion-free stacks. The proposed approach was tested on a dataset of SA stacks acquired from 24 healthy subjects (for which anatomical 3D cardiac images were also available as reference) and compared to two techniques which use LA intensity images and LA segmentations as targets, respectively. The results show the accuracy and robustness of the proposed approach in motion compensation.
△ Less
Submitted 3 October, 2018;
originally announced October 2018.
-
Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images
Authors:
Jo Schlemper,
Ozan Oktay,
Michiel Schaap,
Mattias Heinrich,
Bernhard Kainz,
Ben Glocker,
Daniel Rueckert
Abstract:
We propose a novel attention gate (AG) model for medical image analysis that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ locali…
▽ More
We propose a novel attention gate (AG) model for medical image analysis that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules when using convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN models such as VGG or U-Net architectures with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed AG models are evaluated on a variety of tasks, including medical image classification and segmentation. For classification, we demonstrate the use case of AGs in scan plane detection for fetal ultrasound screening. We show that the proposed attention mechanism can provide efficient object localisation while improving the overall prediction performance by reducing false positives. For segmentation, the proposed architecture is evaluated on two large 3D CT abdominal datasets with manual annotations for multiple organs. Experimental results show that AG models consistently improve the prediction performance of the base architectures across different datasets and training sizes while preserving computational efficiency. Moreover, AGs guide the model activations to be focused around salient regions, which provides better insights into how model predictions are made. The source code for the proposed AG models is publicly available.
△ Less
Submitted 19 January, 2019; v1 submitted 22 August, 2018;
originally announced August 2018.
-
Recurrent neural networks for aortic image sequence segmentation with sparse annotations
Authors:
Wenjia Bai,
Hideaki Suzuki,
Chen Qin,
Giacomo Tarroni,
Ozan Oktay,
Paul M. Matthews,
Daniel Rueckert
Abstract:
Segmentation of image sequences is an important task in medical image analysis, which enables clinicians to assess the anatomy and function of moving organs. However, direct application of a segmentation algorithm to each time frame of a sequence may ignore the temporal continuity inherent in the sequence. In this work, we propose an image sequence segmentation algorithm by combining a fully convo…
▽ More
Segmentation of image sequences is an important task in medical image analysis, which enables clinicians to assess the anatomy and function of moving organs. However, direct application of a segmentation algorithm to each time frame of a sequence may ignore the temporal continuity inherent in the sequence. In this work, we propose an image sequence segmentation algorithm by combining a fully convolutional network with a recurrent neural network, which incorporates both spatial and temporal information into the segmentation task. A key challenge in training this network is that the available manual annotations are temporally sparse, which forbids end-to-end training. We address this challenge by performing non-rigid label propagation on the annotations and introducing an exponentially weighted loss function for training. Experiments on aortic MR image sequences demonstrate that the proposed method significantly improves both accuracy and temporal smoothness of segmentation, compared to a baseline method that utilises spatial information only. It achieves an average Dice metric of 0.960 for the ascending aorta and 0.953 for the descending aorta.
△ Less
Submitted 1 August, 2018;
originally announced August 2018.
-
Learning Interpretable Anatomical Features Through Deep Generative Models: Application to Cardiac Remodeling
Authors:
Carlo Biffi,
Ozan Oktay,
Giacomo Tarroni,
Wenjia Bai,
Antonio De Marvao,
Georgia Doumou,
Martin Rajchl,
Reem Bedair,
Sanjay Prasad,
Stuart Cook,
Declan O'Regan,
Daniel Rueckert
Abstract:
Alterations in the geometry and function of the heart define well-established causes of cardiovascular disease. However, current approaches to the diagnosis of cardiovascular diseases often rely on subjective human assessment as well as manual analysis of medical images. Both factors limit the sensitivity in quantifying complex structural and functional phenotypes. Deep learning approaches have re…
▽ More
Alterations in the geometry and function of the heart define well-established causes of cardiovascular disease. However, current approaches to the diagnosis of cardiovascular diseases often rely on subjective human assessment as well as manual analysis of medical images. Both factors limit the sensitivity in quantifying complex structural and functional phenotypes. Deep learning approaches have recently achieved success for tasks such as classification or segmentation of medical images, but lack interpretability in the feature extraction and decision processes, limiting their value in clinical diagnosis. In this work, we propose a 3D convolutional generative model for automatic classification of images from patients with cardiac diseases associated with structural remodeling. The model leverages interpretable task-specific anatomic patterns learned from 3D segmentations. It further allows to visualise and quantify the learned pathology-specific remodeling patterns in the original input space of the images. This approach yields high accuracy in the categorization of healthy and hypertrophic cardiomyopathy subjects when tested on unseen MR images from our own multi-centre dataset (100%) as well on the ACDC MICCAI 2017 dataset (90%). We believe that the proposed deep learning approach is a promising step towards the development of interpretable classifiers for the medical imaging domain, which may help clinicians to improve diagnostic accuracy and enhance patient risk-stratification.
△ Less
Submitted 18 July, 2018;
originally announced July 2018.
-
Adversarial and Perceptual Refinement for Compressed Sensing MRI Reconstruction
Authors:
Maximilian Seitzer,
Guang Yang,
Jo Schlemper,
Ozan Oktay,
Tobias Würfl,
Vincent Christlein,
Tom Wong,
Raad Mohiaddin,
David Firmin,
Jennifer Keegan,
Daniel Rueckert,
Andreas Maier
Abstract:
Deep learning approaches have shown promising performance for compressed sensing-based Magnetic Resonance Imaging. While deep neural networks trained with mean squared error (MSE) loss functions can achieve high peak signal to noise ratio, the reconstructed images are often blurry and lack sharp details, especially for higher undersampling rates. Recently, adversarial and perceptual loss functions…
▽ More
Deep learning approaches have shown promising performance for compressed sensing-based Magnetic Resonance Imaging. While deep neural networks trained with mean squared error (MSE) loss functions can achieve high peak signal to noise ratio, the reconstructed images are often blurry and lack sharp details, especially for higher undersampling rates. Recently, adversarial and perceptual loss functions have been shown to achieve more visually appealing results. However, it remains an open question how to (1) optimally combine these loss functions with the MSE loss function and (2) evaluate such a perceptual enhancement. In this work, we propose a hybrid method, in which a visual refinement component is learnt on top of an MSE loss-based reconstruction network. In addition, we introduce a semantic interpretability score, measuring the visibility of the region of interest in both ground truth and reconstructed images, which allows us to objectively quantify the usefulness of the image quality for image post-processing and analysis. Applied on a large cardiac MRI dataset simulated with 8-fold undersampling, we demonstrate significant improvements ($p<0.01$) over the state-of-the-art in both a human observer study and the semantic interpretability score.
△ Less
Submitted 28 June, 2018;
originally announced June 2018.
-
Real-time Prediction of Segmentation Quality
Authors:
Robert Robinson,
Ozan Oktay,
Wenjia Bai,
Vanya Valindria,
Mihir Sanghvi,
Nay Aung,
José Paiva,
Filip Zemrak,
Kenneth Fung,
Elena Lukaschuk,
Aaron Lee,
Valentina Carapella,
Young Jin Kim,
Bernhard Kainz,
Stefan Piechnik,
Stefan Neubauer,
Steffen Petersen,
Chris Page,
Daniel Rueckert,
Ben Glocker
Abstract:
Recent advances in deep learning based image segmentation methods have enabled real-time performance with human-level accuracy. However, occasionally even the best method fails due to low image quality, artifacts or unexpected behaviour of black box algorithms. Being able to predict segmentation quality in the absence of ground truth is of paramount importance in clinical practice, but also in lar…
▽ More
Recent advances in deep learning based image segmentation methods have enabled real-time performance with human-level accuracy. However, occasionally even the best method fails due to low image quality, artifacts or unexpected behaviour of black box algorithms. Being able to predict segmentation quality in the absence of ground truth is of paramount importance in clinical practice, but also in large-scale studies to avoid the inclusion of invalid data in subsequent analysis.
In this work, we propose two approaches of real-time automated quality control for cardiovascular MR segmentations using deep learning. First, we train a neural network on 12,880 samples to predict Dice Similarity Coefficients (DSC) on a per-case basis. We report a mean average error (MAE) of 0.03 on 1,610 test samples and 97% binary classification accuracy for separating low and high quality segmentations. Secondly, in the scenario where no manually annotated data is available, we train a network to predict DSC scores from estimated quality obtained via a reverse testing strategy. We report an MAE=0.14 and 91% binary classification accuracy for this case. Predictions are obtained in real-time which, when combined with real-time segmentation methods, enables instant feedback on whether an acquired scan is analysable while the patient is still in the scanner. This further enables new applications of optimising image acquisition towards best possible analysis results.
△ Less
Submitted 16 June, 2018;
originally announced June 2018.
-
Automatic View Planning with Multi-scale Deep Reinforcement Learning Agents
Authors:
Amir Alansary,
Loic Le Folgoc,
Ghislain Vaillant,
Ozan Oktay,
Yuanwei Li,
Wenjia Bai,
Jonathan Passerat-Palmbach,
Ricardo Guerrero,
Konstantinos Kamnitsas,
Benjamin Hou,
Steven McDonagh,
Ben Glocker,
Bernhard Kainz,
Daniel Rueckert
Abstract:
We propose a fully automatic method to find standardized view planes in 3D image acquisitions. Standard view images are important in clinical practice as they provide a means to perform biometric measurements from similar anatomical regions. These views are often constrained to the native orientation of a 3D image acquisition. Navigating through target anatomy to find the required view plane is te…
▽ More
We propose a fully automatic method to find standardized view planes in 3D image acquisitions. Standard view images are important in clinical practice as they provide a means to perform biometric measurements from similar anatomical regions. These views are often constrained to the native orientation of a 3D image acquisition. Navigating through target anatomy to find the required view plane is tedious and operator-dependent. For this task, we employ a multi-scale reinforcement learning (RL) agent framework and extensively evaluate several Deep Q-Network (DQN) based strategies. RL enables a natural learning paradigm by interaction with the environment, which can be used to mimic experienced operators. We evaluate our results using the distance between the anatomical landmarks and detected planes, and the angles between their normal vector and target. The proposed algorithm is assessed on the mid-sagittal and anterior-posterior commissure planes of brain MRI, and the 4-chamber long-axis plane commonly used in cardiac MRI, achieving accuracy of 1.53mm, 1.98mm and 4.84mm, respectively.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
Attention-Gated Networks for Improving Ultrasound Scan Plane Detection
Authors:
Jo Schlemper,
Ozan Oktay,
Liang Chen,
Jacqueline Matthew,
Caroline Knight,
Bernhard Kainz,
Ben Glocker,
Daniel Rueckert
Abstract:
In this work, we apply an attention-gated network to real-time automated scan plane detection for fetal ultrasound screening. Scan plane detection in fetal ultrasound is a challenging problem due the poor image quality resulting in low interpretability for both clinicians and automated algorithms. To solve this, we propose incorporating self-gated soft-attention mechanisms. A soft-attention mechan…
▽ More
In this work, we apply an attention-gated network to real-time automated scan plane detection for fetal ultrasound screening. Scan plane detection in fetal ultrasound is a challenging problem due the poor image quality resulting in low interpretability for both clinicians and automated algorithms. To solve this, we propose incorporating self-gated soft-attention mechanisms. A soft-attention mechanism generates a gating signal that is end-to-end trainable, which allows the network to contextualise local information useful for prediction. The proposed attention mechanism is generic and it can be easily incorporated into any existing classification architectures, while only requiring a few additional parameters. We show that, when the base network has a high capacity, the incorporated attention mechanism can provide efficient object localisation while improving the overall performance. When the base network has a low capacity, the method greatly outperforms the baseline approach and significantly reduces false positives. Lastly, the generated attention maps allow us to understand the model's reasoning process, which can also be used for weakly supervised object localisation.
△ Less
Submitted 15 April, 2018;
originally announced April 2018.
-
Attention U-Net: Learning Where to Look for the Pancreas
Authors:
Ozan Oktay,
Jo Schlemper,
Loic Le Folgoc,
Matthew Lee,
Mattias Heinrich,
Kazunari Misawa,
Kensaku Mori,
Steven McDonagh,
Nils Y Hammerla,
Bernhard Kainz,
Ben Glocker,
Daniel Rueckert
Abstract:
We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation…
▽ More
We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy. The proposed Attention U-Net architecture is evaluated on two large CT abdominal datasets for multi-class image segmentation. Experimental results show that AGs consistently improve the prediction performance of U-Net across different datasets and training sizes while preserving computational efficiency. The code for the proposed architecture is publicly available.
△ Less
Submitted 20 May, 2018; v1 submitted 11 April, 2018;
originally announced April 2018.
-
Learning-Based Quality Control for Cardiac MR Images
Authors:
Giacomo Tarroni,
Ozan Oktay,
Wenjia Bai,
Andreas Schuh,
Hideaki Suzuki,
Jonathan Passerat-Palmbach,
Antonio de Marvao,
Declan P. O'Regan,
Stuart Cook,
Ben Glocker,
Paul M. Matthews,
Daniel Rueckert
Abstract:
The effectiveness of a cardiovascular magnetic resonance (CMR) scan depends on the ability of the operator to correctly tune the acquisition parameters to the subject being scanned and on the potential occurrence of imaging artefacts such as cardiac and respiratory motion. In the clinical practice, a quality control step is performed by visual assessment of the acquired images: however, this proce…
▽ More
The effectiveness of a cardiovascular magnetic resonance (CMR) scan depends on the ability of the operator to correctly tune the acquisition parameters to the subject being scanned and on the potential occurrence of imaging artefacts such as cardiac and respiratory motion. In the clinical practice, a quality control step is performed by visual assessment of the acquired images: however, this procedure is strongly operator-dependent, cumbersome and sometimes incompatible with the time constraints in clinical settings and large-scale studies. We propose a fast, fully-automated, learning-based quality control pipeline for CMR images, specifically for short-axis image stacks. Our pipeline performs three important quality checks: 1) heart coverage estimation, 2) inter-slice motion detection, 3) image contrast estimation in the cardiac region. The pipeline uses a hybrid decision forest method - integrating both regression and structured classification models - to extract landmarks as well as probabilistic segmentation maps from both long- and short-axis images as a basis to perform the quality checks. The technique was tested on up to 3000 cases from the UK Biobank as well as on 100 cases from the UK Digital Heart Project, and validated against manual annotations and visual inspections performed by expert interpreters. The results show the capability of the proposed pipeline to correctly detect incomplete or corrupted scans (e.g. on UK Biobank, sensitivity and specificity respectively 88% and 99% for heart coverage estimation, 85% and 95% for motion detection), allowing their exclusion from the analysed dataset or the triggering of a new acquisition.
△ Less
Submitted 15 September, 2018; v1 submitted 25 March, 2018;
originally announced March 2018.
-
TernaryNet: Faster Deep Model Inference without GPUs for Medical 3D Segmentation using Sparse and Binary Convolutions
Authors:
Mattias P. Heinrich,
Max Blendowski,
Ozan Oktay
Abstract:
Deep convolutional neural networks (DCNN) are currently ubiquitous in medical imaging. While their versatility and high quality results for common image analysis tasks including segmentation, localisation and prediction is astonishing, the large representational power comes at the cost of highly demanding computational effort. This limits their practical applications for image guided interventions…
▽ More
Deep convolutional neural networks (DCNN) are currently ubiquitous in medical imaging. While their versatility and high quality results for common image analysis tasks including segmentation, localisation and prediction is astonishing, the large representational power comes at the cost of highly demanding computational effort. This limits their practical applications for image guided interventions and diagnostic (point-of-care) support using mobile devices without graphics processing units (GPU). We propose a new scheme that approximates both trainable weights and neural activations in deep networks by ternary values and tackles the open question of backpropagation when dealing with non-differentiable functions. Our solution enables the removal of the expensive floating-point matrix multiplications throughout any convolutional neural network and replaces them by energy and time preserving binary operators and population counts. Our approach, which is demonstrated using a fully-convolutional network (FCN) for CT pancreas segmentation leads to more than 10-fold reduced memory requirements and we provide a concept for sub-second inference without GPUs. Our ternary approximation obtains high accuracies (without any post-processing) with a Dice overlap of 71.0% that are statistically equivalent to using networks with high-precision weights and activations. We further demonstrate the significant improvements reached in comparison to binary quantisation and without our proposed ternary hyperbolic tangent continuation. We present a key enabling technique for highly efficient DCNN inference without GPUs that will help to bring the advances of deep learning to practical clinical applications. It has also great promise for improving accuracies in large-scale medical data retrieval.
△ Less
Submitted 29 January, 2018;
originally announced January 2018.
-
Automated cardiovascular magnetic resonance image analysis with fully convolutional networks
Authors:
Wenjia Bai,
Matthew Sinclair,
Giacomo Tarroni,
Ozan Oktay,
Martin Rajchl,
Ghislain Vaillant,
Aaron M. Lee,
Nay Aung,
Elena Lukaschuk,
Mihir M. Sanghvi,
Filip Zemrak,
Kenneth Fung,
Jose Miguel Paiva,
Valentina Carapella,
Young Jin Kim,
Hideaki Suzuki,
Bernhard Kainz,
Paul M. Matthews,
Steffen E. Petersen,
Stefan K. Piechnik,
Stefan Neubauer,
Ben Glocker,
Daniel Rueckert
Abstract:
Cardiovascular magnetic resonance (CMR) imaging is a standard imaging modality for assessing cardiovascular diseases (CVDs), the leading cause of death globally. CMR enables accurate quantification of the cardiac chamber volume, ejection fraction and myocardial mass, providing information for diagnosis and monitoring of CVDs. However, for years, clinicians have been relying on manual approaches fo…
▽ More
Cardiovascular magnetic resonance (CMR) imaging is a standard imaging modality for assessing cardiovascular diseases (CVDs), the leading cause of death globally. CMR enables accurate quantification of the cardiac chamber volume, ejection fraction and myocardial mass, providing information for diagnosis and monitoring of CVDs. However, for years, clinicians have been relying on manual approaches for CMR image analysis, which is time consuming and prone to subjective errors. It is a major clinical challenge to automatically derive quantitative and clinically relevant information from CMR images. Deep neural networks have shown a great potential in image pattern recognition and segmentation for a variety of tasks. Here we demonstrate an automated analysis method for CMR images, which is based on a fully convolutional network (FCN). The network is trained and evaluated on a large-scale dataset from the UK Biobank, consisting of 4,875 subjects with 93,500 pixelwise annotated images. The performance of the method has been evaluated using a number of technical metrics, including the Dice metric, mean contour distance and Hausdorff distance, as well as clinically relevant measures, including left ventricle (LV) end-diastolic volume (LVEDV) and end-systolic volume (LVESV), LV mass (LVM); right ventricle (RV) end-diastolic volume (RVEDV) and end-systolic volume (RVESV). By combining FCN with a large-scale annotated dataset, the proposed automated method achieves a high performance on par with human experts in segmenting the LV and RV on short-axis CMR images and the left atrium (LA) and right atrium (RA) on long-axis CMR images.
△ Less
Submitted 22 May, 2018; v1 submitted 25 October, 2017;
originally announced October 2017.
-
Anatomically Constrained Neural Networks (ACNN): Application to Cardiac Image Enhancement and Segmentation
Authors:
Ozan Oktay,
Enzo Ferrante,
Konstantinos Kamnitsas,
Mattias Heinrich,
Wenjia Bai,
Jose Caballero,
Stuart Cook,
Antonio de Marvao,
Timothy Dawes,
Declan O'Regan,
Bernhard Kainz,
Ben Glocker,
Daniel Rueckert
Abstract:
Incorporation of prior knowledge about organ shape and location is key to improve performance of image analysis approaches. In particular, priors can be useful in cases where images are corrupted and contain artefacts due to limitations in image acquisition. The highly constrained nature of anatomical objects can be well captured with learning based techniques. However, in most recent and promisin…
▽ More
Incorporation of prior knowledge about organ shape and location is key to improve performance of image analysis approaches. In particular, priors can be useful in cases where images are corrupted and contain artefacts due to limitations in image acquisition. The highly constrained nature of anatomical objects can be well captured with learning based techniques. However, in most recent and promising techniques such as CNN based segmentation it is not obvious how to incorporate such prior knowledge. State-of-the-art methods operate as pixel-wise classifiers where the training objectives do not incorporate the structure and inter-dependencies of the output. To overcome this limitation, we propose a generic training strategy that incorporates anatomical prior knowledge into CNNs through a new regularisation model, which is trained end-to-end. The new framework encourages models to follow the global anatomical properties of the underlying anatomy (e.g. shape, label structure) via learned non-linear representations of the shape. We show that the proposed approach can be easily adapted to different analysis tasks (e.g. image enhancement, segmentation) and improve the prediction accuracy of the state-of-the-art models. The applicability of our approach is shown on multi-modal cardiac datasets and public benchmarks. Additionally, we demonstrate how the learned deep models of 3D shapes can be interpreted and used as biomarkers for classification of cardiac pathologies.
△ Less
Submitted 5 December, 2017; v1 submitted 22 May, 2017;
originally announced May 2017.
-
Context-Sensitive Super-Resolution for Fast Fetal Magnetic Resonance Imaging
Authors:
Steven McDonagh,
Benjamin Hou,
Konstantinos Kamnitsas,
Ozan Oktay,
Amir Alansary,
Mary Rutherford,
Jo V. Hajnal,
Bernhard Kainz
Abstract:
3D Magnetic Resonance Imaging (MRI) is often a trade-off between fast but low-resolution image acquisition and highly detailed but slow image acquisition. Fast imaging is required for targets that move to avoid motion artefacts. This is in particular difficult for fetal MRI. Spatially independent upsampling techniques, which are the state-of-the-art to address this problem, are error prone and dis…
▽ More
3D Magnetic Resonance Imaging (MRI) is often a trade-off between fast but low-resolution image acquisition and highly detailed but slow image acquisition. Fast imaging is required for targets that move to avoid motion artefacts. This is in particular difficult for fetal MRI. Spatially independent upsampling techniques, which are the state-of-the-art to address this problem, are error prone and disregard contextual information. In this paper we propose a context-sensitive upsampling method based on a residual convolutional neural network model that learns organ specific appearance and adopts semantically to input data allowing for the generation of high resolution images with sharp edges and fine scale detail. By making contextual decisions about appearance and shape, present in different parts of an image, we gain a maximum of structural detail at a similar contrast as provided by high-resolution data. We experiment on $145$ fetal scans and show that our approach yields an increased PSNR of $1.25$ $dB$ when applied to under-sampled fetal data \emph{cf.} baseline upsampling. Furthermore, our method yields an increased PSNR of $1.73$ $dB$ when utilizing under-sampled fetal data to perform brain volume reconstruction on motion corrupted captured data.
△ Less
Submitted 23 September, 2017; v1 submitted 28 February, 2017;
originally announced March 2017.
-
Learning under Distributed Weak Supervision
Authors:
Martin Rajchl,
Matthew C. H. Lee,
Franklin Schrans,
Alice Davidson,
Jonathan Passerat-Palmbach,
Giacomo Tarroni,
Amir Alansary,
Ozan Oktay,
Bernhard Kainz,
Daniel Rueckert
Abstract:
The availability of training data for supervision is a frequently encountered bottleneck of medical image analysis methods. While typically established by a clinical expert rater, the increase in acquired imaging data renders traditional pixel-wise segmentations less feasible. In this paper, we examine the use of a crowdsourcing platform for the distribution of super-pixel weak annotation tasks an…
▽ More
The availability of training data for supervision is a frequently encountered bottleneck of medical image analysis methods. While typically established by a clinical expert rater, the increase in acquired imaging data renders traditional pixel-wise segmentations less feasible. In this paper, we examine the use of a crowdsourcing platform for the distribution of super-pixel weak annotation tasks and collect such annotations from a crowd of non-expert raters. The crowd annotations are subsequently used for training a fully convolutional neural network to address the problem of fetal brain segmentation in T2-weighted MR images. Using this approach we report encouraging results compared to highly targeted, fully supervised methods and potentially address a frequent problem impeding image analysis research.
△ Less
Submitted 3 June, 2016;
originally announced June 2016.
-
DeepCut: Object Segmentation from Bounding Box Annotations using Convolutional Neural Networks
Authors:
Martin Rajchl,
Matthew C. H. Lee,
Ozan Oktay,
Konstantinos Kamnitsas,
Jonathan Passerat-Palmbach,
Wenjia Bai,
Mellisa Damodaram,
Mary A. Rutherford,
Joseph V. Hajnal,
Bernhard Kainz,
Daniel Rueckert
Abstract:
In this paper, we propose DeepCut, a method to obtain pixelwise object segmentations given an image dataset labelled with bounding box annotations. It extends the approach of the well-known GrabCut method to include machine learning by training a neural network classifier from bounding box annotations. We formulate the problem as an energy minimisation problem over a densely-connected conditional…
▽ More
In this paper, we propose DeepCut, a method to obtain pixelwise object segmentations given an image dataset labelled with bounding box annotations. It extends the approach of the well-known GrabCut method to include machine learning by training a neural network classifier from bounding box annotations. We formulate the problem as an energy minimisation problem over a densely-connected conditional random field and iteratively update the training targets to obtain pixelwise object segmentations. Additionally, we propose variants of the DeepCut method and compare those to a naive approach to CNN training under weak supervision. We test its applicability to solve brain and lung segmentation problems on a challenging fetal magnetic resonance dataset and obtain encouraging results in terms of accuracy.
△ Less
Submitted 5 June, 2016; v1 submitted 25 May, 2016;
originally announced May 2016.
-
Reconstruction and Estimation of Scattering Functions of Overspread Radar Targets
Authors:
Onur Oktay,
Götz Pfander,
Pavel Zheltov
Abstract:
In many radar scenarios, the radar target or the medium is assumed to possess randomly varying parts. The properties of a target are described by a random process known as the spreading function. Its second order statistics under the WSSUS assumption are given by the scattering function. Recent developments in the operator identification theory suggest a channel sounding procedure that allows to d…
▽ More
In many radar scenarios, the radar target or the medium is assumed to possess randomly varying parts. The properties of a target are described by a random process known as the spreading function. Its second order statistics under the WSSUS assumption are given by the scattering function. Recent developments in the operator identification theory suggest a channel sounding procedure that allows to determine the spreading function given complete statistical knowledge of the operator echo. We show that in a continuous model it is indeed theoretically possible to identify a scattering function of an overspread target given full statistics of a received echo from a single sounding by a custom weighted delta train. Our results apply whenever the scattering function is supported on a set of area less than one. Absent such complete statistics, we construct and analyze an estimator that can be used as a replacement of the averaged periodogram estimator in case of poor geometry of the support set of the scattering function.
△ Less
Submitted 17 November, 2011; v1 submitted 27 June, 2011;
originally announced June 2011.