-
DALL-M: Context-Aware Clinical Data Augmentation with LLMs
Authors:
Chihcheng Hsieh,
Catarina Moreira,
Isabel Blanco Nobre,
Sandra Costa Sousa,
Chun Ouyang,
Margot Brereton,
Joaquim Jorge,
Jacinto C. Nascimento
Abstract:
X-ray images are vital in medical diagnostics, but their effectiveness is limited without clinical context. Radiologists often find chest X-rays insufficient for diagnosing underlying diseases, necessitating the integration of structured clinical features with radiology reports.
To address this, we introduce DALL-M, a novel framework that enhances clinical datasets by generating contextual synth…
▽ More
X-ray images are vital in medical diagnostics, but their effectiveness is limited without clinical context. Radiologists often find chest X-rays insufficient for diagnosing underlying diseases, necessitating the integration of structured clinical features with radiology reports.
To address this, we introduce DALL-M, a novel framework that enhances clinical datasets by generating contextual synthetic data. DALL-M augments structured patient data, including vital signs (e.g., heart rate, oxygen saturation), radiology findings (e.g., lesion presence), and demographic factors. It integrates this tabular data with contextual knowledge extracted from radiology reports and domain-specific resources (e.g., Radiopaedia, Wikipedia), ensuring clinical consistency and reliability.
DALL-M follows a three-phase process: (i) clinical context storage, (ii) expert query generation, and (iii) context-aware feature augmentation. Using large language models (LLMs), it generates both contextual synthetic values for existing clinical features and entirely new, clinically relevant features.
Applied to 799 cases from the MIMIC-IV dataset, DALL-M expanded the original 9 clinical features to 91. Empirical validation with machine learning models (including Decision Trees, Random Forests, XGBoost, and TabNET) demonstrated a 16.5% improvement in F1 score and a 25% increase in Precision and Recall.
DALL-M bridges an important gap in clinical data augmentation by preserving data integrity while enhancing predictive modeling in healthcare. Our results show that integrating LLM-generated synthetic features significantly improves model performance, making DALL-M a scalable and practical approach for AI-driven medical diagnostics.
△ Less
Submitted 15 March, 2025; v1 submitted 11 July, 2024;
originally announced July 2024.
-
MDF-Net for abnormality detection by fusing X-rays with clinical data
Authors:
Chihcheng Hsieh,
Isabel Blanco Nobre,
Sandra Costa Sousa,
Chun Ouyang,
Margot Brereton,
Jacinto C. Nascimento,
Joaquim Jorge,
Catarina Moreira
Abstract:
This study investigates the effects of including patients' clinical information on the performance of deep learning (DL) classifiers for disease location in chest X-ray images. Although current classifiers achieve high performance using chest X-ray images alone, our interviews with radiologists indicate that clinical data is highly informative and essential for interpreting images and making prope…
▽ More
This study investigates the effects of including patients' clinical information on the performance of deep learning (DL) classifiers for disease location in chest X-ray images. Although current classifiers achieve high performance using chest X-ray images alone, our interviews with radiologists indicate that clinical data is highly informative and essential for interpreting images and making proper diagnoses.
In this work, we propose a novel architecture consisting of two fusion methods that enable the model to simultaneously process patients' clinical data (structured data) and chest X-rays (image data). Since these data modalities are in different dimensional spaces, we propose a spatial arrangement strategy, spatialization, to facilitate the multimodal learning process in a Mask R-CNN model. We performed an extensive experimental evaluation using MIMIC-Eye, a dataset comprising modalities: MIMIC-CXR (chest X-ray images), MIMIC IV-ED (patients' clinical data), and REFLACX (annotations of disease locations in chest X-rays).
Results show that incorporating patients' clinical data in a DL model together with the proposed fusion methods improves the disease localization in chest X-rays by 12\% in terms of Average Precision compared to a standard Mask R-CNN using only chest X-rays. Further ablation studies also emphasize the importance of multimodal DL architectures and the incorporation of patients' clinical data in disease localization. The architecture proposed in this work is publicly available to promote the scientific reproducibility of our study (https://github.com/ChihchengHsieh/multimodal-abnormalities-detection)
△ Less
Submitted 27 December, 2023; v1 submitted 26 February, 2023;
originally announced February 2023.
-
Exploring technologies to better link physical evidence and digital information for disaster victim identification
Authors:
David Lovell,
Kellie Vella,
Diego Muñoz,
Matt McKague,
Margot Brereton,
Peter Ellis
Abstract:
Disaster victim identification (DVI) entails a protracted process of evidence collection and data matching to reconcile physical remains with victim identity. Technology is critical to DVI by enabling the linkage of physical evidence to information. However, labelling physical remains and collecting data at the scene are dominated by low-technology paper-based practices. We ask, how can technology…
▽ More
Disaster victim identification (DVI) entails a protracted process of evidence collection and data matching to reconcile physical remains with victim identity. Technology is critical to DVI by enabling the linkage of physical evidence to information. However, labelling physical remains and collecting data at the scene are dominated by low-technology paper-based practices. We ask, how can technology help us tag and track the victims of disaster? Our response has two parts. First, we conducted a human-computer interaction led investigation into the systematic factors impacting DVI tagging and tracking processes. Through interviews with Australian DVI practitioners, we explored how technologies to improve linkage might fit with prevailing work practices and preferences; practical and social considerations; and existing systems and processes. Using insights from these interviews and relevant literature, we identified four critical themes: protocols and training; stress and stressors; the plurality of information capture and management systems; and practicalities and constraints. Second, we applied the themes identified in the first part of the investigation to critically review technologies that could support DVI practitioners by enhancing DVI processes that link physical evidence to information. This resulted in an overview of candidate technologies matched with consideration of their key attributes. This study recognises the importance of considering human factors that can affect technology adoption into existing practices. We provide a searchable table (Supplementary Information) that relates technologies to the key attributes relevant to DVI practice, for the reader to apply to their own context. While this research directly contributes to DVI, it also has applications to other domains in which a physical/digital linkage is required, particularly within high-stress environments.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
ROM-based quantum computation: Experimental explorations using Nuclear Magnetic Resonance, and future prospects
Authors:
D. R. Sypher,
I. M. Brereton,
H. M. Wiseman,
B. L. Hollis,
B. C. Travaglione
Abstract:
ROM-based quantum computation (QC) is an alternative to oracle-based QC. It has the advantages of being less ``magical'', and being more suited to implementing space-efficient computation (i.e. computation using the minimum number of writable qubits). Here we consider a number of small (one and two-qubit) quantum algorithms illustrating different aspects of ROM-based QC. They are: (a) a one-qubi…
▽ More
ROM-based quantum computation (QC) is an alternative to oracle-based QC. It has the advantages of being less ``magical'', and being more suited to implementing space-efficient computation (i.e. computation using the minimum number of writable qubits). Here we consider a number of small (one and two-qubit) quantum algorithms illustrating different aspects of ROM-based QC. They are: (a) a one-qubit algorithm to solve the Deutsch problem; (b) a one-qubit binary multiplication algorithm; (c) a two-qubit controlled binary multiplication algorithm; and (d) a two-qubit ROM-based version of the Deutsch-Jozsa algorithm. For each algorithm we present experimental verification using NMR ensemble QC. The average fidelities for the implementation were in the ranges 0.9 - 0.97 for the one-qubit algorithms, and 0.84 - 0.94 for the two-qubit algorithms. We conclude with a discussion of future prospects for ROM-based quantum computation. We propose a four-qubit algorithm, using Grover's iterate, for solving a miniature ``real-world'' problem relating to the lengths of paths in a network.
△ Less
Submitted 20 December, 2001;
originally announced December 2001.