Search | arXiv e-print repository

Keeping Medical AI Healthy: A Review of Detection and Correction Methods for System Degradation

Abstract: Artificial intelligence (AI) is increasingly integrated into modern healthcare, offering powerful support for clinical decision-making. However, in real-world settings, AI systems may experience performance degradation over time, due to factors such as shifting data distributions, changes in patient characteristics, evolving clinical protocols, and variations in data quality. These factors can com… ▽ More Artificial intelligence (AI) is increasingly integrated into modern healthcare, offering powerful support for clinical decision-making. However, in real-world settings, AI systems may experience performance degradation over time, due to factors such as shifting data distributions, changes in patient characteristics, evolving clinical protocols, and variations in data quality. These factors can compromise model reliability, posing safety concerns and increasing the likelihood of inaccurate predictions or adverse outcomes. This review presents a forward-looking perspective on monitoring and maintaining the "health" of AI systems in healthcare. We highlight the urgent need for continuous performance monitoring, early degradation detection, and effective self-correction mechanisms. The paper begins by reviewing common causes of performance degradation at both data and model levels. We then summarize key techniques for detecting data and model drift, followed by an in-depth look at root cause analysis. Correction strategies are further reviewed, ranging from model retraining to test-time adaptation. Our survey spans both traditional machine learning models and state-of-the-art large language models (LLMs), offering insights into their strengths and limitations. Finally, we discuss ongoing technical challenges and propose future research directions. This work aims to guide the development of reliable, robust medical AI systems capable of sustaining safe, long-term deployment in dynamic clinical settings. △ Less

Submitted 20 June, 2025; originally announced June 2025.

Comments: 15 pages, 5 figures

arXiv:2502.06124 [pdf, other]

Foundation Model of Electronic Medical Records for Adaptive Risk Estimation

Authors: Pawel Renc, Michal K. Grzeszczyk, Nassim Oufattole, Deirdre Goode, Yugang Jia, Szymon Bieganski, Matthew B. A. McDermott, Jaroslaw Was, Anthony E. Samir, Jonathan W. Cunningham, David W. Bates, Arkadiusz Sitek

Abstract: The U.S. allocates nearly 18% of its GDP to healthcare but experiences lower life expectancy and higher preventable death rates compared to other high-income nations. Hospitals struggle to predict critical outcomes such as mortality, ICU admission, and prolonged hospital stays. Traditional early warning systems, like NEWS and MEWS, rely on static variables and fixed thresholds, limiting their adap… ▽ More The U.S. allocates nearly 18% of its GDP to healthcare but experiences lower life expectancy and higher preventable death rates compared to other high-income nations. Hospitals struggle to predict critical outcomes such as mortality, ICU admission, and prolonged hospital stays. Traditional early warning systems, like NEWS and MEWS, rely on static variables and fixed thresholds, limiting their adaptability, accuracy, and personalization. We developed the Enhanced Transformer for Health Outcome Simulation (ETHOS), an AI model that tokenizes patient health timelines (PHTs) from EHRs and uses transformer-based architectures to predict future PHTs. The Adaptive Risk Estimation System (ARES) leverages ETHOS to compute dynamic, personalized risk probabilities for clinician-defined critical events. ARES also features a personalized explainability module highlighting key clinical factors influencing risk estimates. We evaluated ARES on the MIMIC-IV v2.2 dataset in emergency department settings, benchmarking its performance against traditional early warning systems and machine learning models. From 299,721 unique patients, 285,622 PHTs (60% with hospital admissions) were processed, comprising over 357 million tokens. ETHOS outperformed benchmark models in predicting hospital admissions, ICU admissions, and prolonged stays, achieving superior AUC scores. Its risk estimates were robust across demographic subgroups, with calibration curves confirming model reliability. The explainability module provided valuable insights into patient-specific risk factors. ARES, powered by ETHOS, advances predictive healthcare AI by delivering dynamic, real-time, personalized risk estimation with patient-specific explainability. Its adaptability and accuracy offer a transformative tool for clinical decision-making, potentially improving patient outcomes and resource allocation. △ Less

Submitted 13 March, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

Comments: Fix affiliation list

arXiv:2410.04304 [pdf]

Robotics Meets Software Engineering: A First Look at the Robotics Discussions on Stackoverflow

Authors: Hisham Kidwai, Danika Passler Bates, Sujana Islam Suhi, James Young, Shaiful Chowdhury

Abstract: Robots can greatly enhance human capabilities, yet their development presents a range of challenges. This collaborative study, conducted by a team of software engineering and robotics researchers, seeks to identify the challenges encountered by robot developers by analyzing questions posted on StackOverflow. We created a filtered dataset of 500 robotics-related questions and examined their charact… ▽ More Robots can greatly enhance human capabilities, yet their development presents a range of challenges. This collaborative study, conducted by a team of software engineering and robotics researchers, seeks to identify the challenges encountered by robot developers by analyzing questions posted on StackOverflow. We created a filtered dataset of 500 robotics-related questions and examined their characteristics, comparing them with randomly selected questions from the platform. Our findings indicate that the small size of the robotics community limits the visibility of these questions, resulting in fewer responses. While the number of robotics questions has been steadily increasing, they remain less popular than the average question and answer on StackOverflow. This underscores the importance of research that focuses on the challenges faced by robotics practitioners. Consequently, we conducted a thematic analysis of the 500 robotics questions to uncover common inquiry patterns. We identified 11 major themes, with questions about robot movement being the most frequent. Our analysis of yearly trends revealed that certain themes, such as Specifications, were prominent from 2009 to 2014 but have since diminished in relevance. In contrast, themes like Moving, Actuator, and Remote have consistently dominated discussions over the years. These findings suggest that challenges in robotics may vary over time. Notably, the majority of robotics questions are framed as How questions, rather than Why or What questions, revealing the lack of enough resources for the practitioners. These insights can help guide researchers and educators in developing effective and timely educational materials for robotics practitioners. △ Less

Submitted 5 October, 2024; originally announced October 2024.

arXiv:2407.21124 [pdf, other]

doi 10.1038/s41746-024-01235-0

Zero Shot Health Trajectory Prediction Using Transformer

Authors: Pawel Renc, Yugang Jia, Anthony E. Samir, Jaroslaw Was, Quanzheng Li, David W. Bates, Arkadiusz Sitek

Abstract: Integrating modern machine learning and clinical decision-making has great promise for mitigating healthcare's increasing cost and complexity. We introduce the Enhanced Transformer for Health Outcome Simulation (ETHOS), a novel application of the transformer deep-learning architecture for analyzing high-dimensional, heterogeneous, and episodic health data. ETHOS is trained using Patient Health Tim… ▽ More Integrating modern machine learning and clinical decision-making has great promise for mitigating healthcare's increasing cost and complexity. We introduce the Enhanced Transformer for Health Outcome Simulation (ETHOS), a novel application of the transformer deep-learning architecture for analyzing high-dimensional, heterogeneous, and episodic health data. ETHOS is trained using Patient Health Timelines (PHTs)-detailed, tokenized records of health events-to predict future health trajectories, leveraging a zero-shot learning approach. ETHOS represents a significant advancement in foundation model development for healthcare analytics, eliminating the need for labeled data and model fine-tuning. Its ability to simulate various treatment pathways and consider patient-specific factors positions ETHOS as a tool for care optimization and addressing biases in healthcare delivery. Future developments will expand ETHOS' capabilities to incorporate a wider range of data types and data sources. Our work demonstrates a pathway toward accelerated AI development and deployment in healthcare. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2206.14358 [pdf]

doi 10.1093/jamia/ocac114

Using Twitter Data to Understand Public Perceptions of Approved versus Off-label Use for COVID-19-related Medications

Authors: Yining Hua, Hang Jiang, Shixu Lin, Jie Yang, Joseph M. Plasek, David W. Bates, Li Zhou

Abstract: Understanding public discourse on emergency use of unproven therapeutics is crucial for monitoring safe use and combating misinformation. We developed a natural language processing-based pipeline to comprehend public perceptions of and stances on coronavirus disease 2019 (COVID-19)-related drugs on Twitter over time. This retrospective study included 609,189 US-based tweets from January 29, 2020,… ▽ More Understanding public discourse on emergency use of unproven therapeutics is crucial for monitoring safe use and combating misinformation. We developed a natural language processing-based pipeline to comprehend public perceptions of and stances on coronavirus disease 2019 (COVID-19)-related drugs on Twitter over time. This retrospective study included 609,189 US-based tweets from January 29, 2020, to November 30, 2021, about four drugs that garnered significant public attention during the COVID-19 pandemic: (1) Hydroxychloroquine and Ivermectin, therapies with anecdotal evidence; and (2) Molnupiravir and Remdesivir, FDA-approved treatments for eligible patients. Time-trend analysis was employed to understand popularity trends and related events. Content and demographic analyses were conducted to explore potential rationales behind people's stances on each drug. Time-trend analysis indicated that Hydroxychloroquine and Ivermectin were discussed more than Molnupiravir and Remdesivir, particularly during COVID-19 surges. Hydroxychloroquine and Ivermectin discussions were highly politicized, related to conspiracy theories, hearsay, and celebrity influences. The distribution of stances between the two major US political parties was significantly different (P < .001); Republicans were more likely to support Hydroxychloroquine (55%) and Ivermectin (30%) than Democrats. People with healthcare backgrounds tended to oppose Hydroxychloroquine (7%) more than the general population, while the general population was more likely to support Ivermectin (14%). Our study found that social media users have varying perceptions and stances on off-label versus FDA-authorized drug use at different stages of COVID-19. This indicates that health systems, regulatory agencies, and policymakers should design tailored strategies to monitor and reduce misinformation to promote safe drug use. △ Less

Submitted 21 January, 2024; v1 submitted 28 June, 2022; originally announced June 2022.

Comments: Full paper published in JAMIA

Journal ref: amiajnl-2022-012337.R1

arXiv:2206.06769 [pdf, other]

Muntjac -- Open Source Multicore RV64 Linux-capable SoC

Authors: Xuan Guo, Daniel Bates, Robert Mullins, Alex Bradbury

Abstract: Muntjac is an open-source collection of components which can be used to build a multicore, Linux-capable system-on-chip. This includes a 64-bit RISC-V core, a cache subsystem, and TileLink interconnect allowing cache-coherent multicore configurations. Each component is easy to understand, verify, and extend, with most being configurable enough to be useful across a wide range of applications. Muntjac is an open-source collection of components which can be used to build a multicore, Linux-capable system-on-chip. This includes a 64-bit RISC-V core, a cache subsystem, and TileLink interconnect allowing cache-coherent multicore configurations. Each component is easy to understand, verify, and extend, with most being configurable enough to be useful across a wide range of applications. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: To be published in the First Workshop on Open-Source Computer Architecture Research

arXiv:2205.08978 [pdf, other]

Fast Neural Network based Solving of Partial Differential Equations

Authors: Jaroslaw Rzepecki, Daniel Bates, Chris Doran

Abstract: We present a novel method for using Neural Networks (NNs) for finding solutions to a class of Partial Differential Equations (PDEs). Our method builds on recent advances in Neural Radiance Field research (NeRFs) and allows for a NN to converge to a PDE solution much faster than classic Physically Informed Neural Network (PINNs) approaches. We present a novel method for using Neural Networks (NNs) for finding solutions to a class of Partial Differential Equations (PDEs). Our method builds on recent advances in Neural Radiance Field research (NeRFs) and allows for a NN to converge to a PDE solution much faster than classic Physically Informed Neural Network (PINNs) approaches. △ Less

Submitted 27 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

arXiv:2103.03048 [pdf, other]

Detecting Spurious Correlations with Sanity Tests for Artificial Intelligence Guided Radiology Systems

Authors: Usman Mahmood, Robik Shrestha, David D. B. Bates, Lorenzo Mannelli, Giuseppe Corrias, Yusuf Erdi, Christopher Kanan

Abstract: Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safe… ▽ More Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans. △ Less

Submitted 4 March, 2021; originally announced March 2021.

arXiv:2102.08362 [pdf, other]

A Hybrid Approach for Reinforcement Learning Using Virtual Policy Gradient for Balancing an Inverted Pendulum

Authors: Dylan Bates

Abstract: Using the policy gradient algorithm, we train a single-hidden-layer neural network to balance a physically accurate simulation of a single inverted pendulum. The trained weights and biases can then be transferred to a physical agent, where they are robust enough to to balance a real inverted pendulum. This hybrid approach of training a simulation allows thousands of trial runs to be completed orde… ▽ More Using the policy gradient algorithm, we train a single-hidden-layer neural network to balance a physically accurate simulation of a single inverted pendulum. The trained weights and biases can then be transferred to a physical agent, where they are robust enough to to balance a real inverted pendulum. This hybrid approach of training a simulation allows thousands of trial runs to be completed orders of magnitude faster than would be possible in the real world, resulting in greatly reduced training time and more iterations, producing a more robust model. When compared with existing reinforcement learning methods, the resulting control is smoother, learned faster, and able to withstand forced disturbances. △ Less

Submitted 6 February, 2021; originally announced February 2021.

Comments: ICAART '21: Proceedings of the 13th International Conference on Agents and Artificial Intelligence, Doctoral Consortium, 2021. 9 pages, 3 figures

ACM Class: I.2.1; I.2.8; I.2.9

arXiv:2102.03670 [pdf, ps, other]

doi 10.1145/3251508

Recommending More Efficient Workflows to Software Developers

Authors: Dylan Bates

Abstract: Existing recommendation systems can help developers improve their software development abilities by recommending new programming tools, such as a refactoring tool or a program navigation tool. However, simply recommending tools in isolation may not, in and of itself, allow developers to successfully complete their tasks. In this paper, I introduce a new recommendation system that recommends workfl… ▽ More Existing recommendation systems can help developers improve their software development abilities by recommending new programming tools, such as a refactoring tool or a program navigation tool. However, simply recommending tools in isolation may not, in and of itself, allow developers to successfully complete their tasks. In this paper, I introduce a new recommendation system that recommends workflows, or sequences of tools, to developers. By learning more efficient workflows, the system could make software developers more efficient. △ Less

Submitted 6 February, 2021; originally announced February 2021.

Comments: Paper accepted at SPLASH '14: Conference on Systems, Programming, and Applications: Software for Humanity, Student Research Competition, October 2014, Portland, OR., 2 pages

ACM Class: D.2.3; H.3.3; K.3.1

arXiv:2009.09232 [pdf, other]

Learned Low Precision Graph Neural Networks

Authors: Yiren Zhao, Duo Wang, Daniel Bates, Robert Mullins, Mateja Jamnik, Pietro Lio

Abstract: Deep Graph Neural Networks (GNNs) show promising performance on a range of graph tasks, yet at present are costly to run and lack many of the optimisations applied to DNNs. We show, for the first time, how to systematically quantise GNNs with minimal or no loss in performance using Network Architecture Search (NAS). We define the possible quantisation search space of GNNs. The proposed novel NAS m… ▽ More Deep Graph Neural Networks (GNNs) show promising performance on a range of graph tasks, yet at present are costly to run and lack many of the optimisations applied to DNNs. We show, for the first time, how to systematically quantise GNNs with minimal or no loss in performance using Network Architecture Search (NAS). We define the possible quantisation search space of GNNs. The proposed novel NAS mechanism, named Low Precision Graph NAS (LPGNAS), constrains both architecture and quantisation choices to be differentiable. LPGNAS learns the optimal architecture coupled with the best quantisation strategy for different components in the GNN automatically using back-propagation in a single search round. On eight different datasets, solving the task of classifying unseen nodes in a graph, LPGNAS generates quantised models with significant reductions in both model and buffer sizes but with similar accuracy to manually designed networks and other NAS results. In particular, on the Pubmed dataset, LPGNAS shows a better size-accuracy Pareto frontier compared to seven other manual and searched baselines, offering a 2.3 times reduction in model size but a 0.4% increase in accuracy when compared to the best NAS competitor. Finally, from our collected quantisation statistics on a wide range of datasets, we suggest a W4A8 (4-bit weights, 8-bit activations) quantisation strategy might be the bottleneck for naive GNN quantisations. △ Less

Submitted 19 September, 2020; originally announced September 2020.

arXiv:2006.03463 [pdf, other]

Sponge Examples: Energy-Latency Attacks on Neural Networks

Authors: Ilia Shumailov, Yiren Zhao, Daniel Bates, Nicolas Papernot, Robert Mullins, Ross Anderson

Abstract: The high energy costs of neural network training and inference led to the use of acceleration hardware such as GPUs and TPUs. While this enabled us to train large-scale neural networks in datacenters and deploy them on edge devices, the focus so far is on average-case performance. In this work, we introduce a novel threat vector against neural networks whose energy consumption or decision latency… ▽ More The high energy costs of neural network training and inference led to the use of acceleration hardware such as GPUs and TPUs. While this enabled us to train large-scale neural networks in datacenters and deploy them on edge devices, the focus so far is on average-case performance. In this work, we introduce a novel threat vector against neural networks whose energy consumption or decision latency are critical. We show how adversaries can exploit carefully crafted $\boldsymbol{sponge}~\boldsymbol{examples}$, which are inputs designed to maximise energy consumption and latency. We mount two variants of this attack on established vision and language models, increasing energy consumption by a factor of 10 to 200. Our attacks can also be used to delay decisions where a network has critical real-time performance, such as in perception for autonomous vehicles. We demonstrate the portability of our malicious inputs across CPUs and a variety of hardware accelerator chips including GPUs, and an ASIC simulator. We conclude by proposing a defense strategy which mitigates our attack by shifting the analysis of energy consumption in hardware from an average-case to a worst-case perspective. △ Less

Submitted 12 May, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

Comments: Accepted at 6th IEEE European Symposium on Security and Privacy (EuroS&P)

arXiv:1903.03046 [pdf, other]

Focused Quantization for Sparse CNNs

Authors: Yiren Zhao, Xitong Gao, Daniel Bates, Robert Mullins, Cheng-Zhong Xu

Abstract: Deep convolutional neural networks (CNNs) are powerful tools for a wide range of vision tasks, but the enormous amount of memory and compute resources required by CNNs pose a challenge in deploying them on constrained devices. Existing compression techniques, while excelling at reducing model sizes, struggle to be computationally friendly. In this paper, we attend to the statistical properties of… ▽ More Deep convolutional neural networks (CNNs) are powerful tools for a wide range of vision tasks, but the enormous amount of memory and compute resources required by CNNs pose a challenge in deploying them on constrained devices. Existing compression techniques, while excelling at reducing model sizes, struggle to be computationally friendly. In this paper, we attend to the statistical properties of sparse CNNs and present focused quantization, a novel quantization strategy based on power-of-two values, which exploits the weight distributions after fine-grained pruning. The proposed method dynamically discovers the most effective numerical representation for weights in layers with varying sparsities, significantly reducing model sizes. Multiplications in quantized CNNs are replaced with much cheaper bit-shift operations for efficient inference. Coupled with lossless encoding, we built a compression pipeline that provides CNNs with high compression ratios (CR), low computation cost and minimal loss in accuracy. In ResNet-50, we achieved a 18.08x CR with only 0.24% loss in top-5 accuracy, outperforming existing compression methods. We fully compressed a ResNet-18 and found that it is not only higher in CR and top-5 accuracy, but also more hardware efficient as it requires fewer logic gates to implement when compared to other state-of-the-art quantization methods assuming the same throughput. △ Less

Submitted 28 October, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

Comments: To appear in NeurIPS 2019, this is the same paper adapted for viewing on arXiv. TL;DR: Better size/accuracy trade-off of compressed sparse models with focused quantization. 11 pages, 5 figures, 4 tables

arXiv:1601.00894 [pdf, other]

Configurable memory systems for embedded many-core processors

Authors: Daniel Bates, Alex Chadwick, Robert Mullins

Abstract: The memory system of a modern embedded processor consumes a large fraction of total system energy. We explore a range of different configuration options and show that a reconfigurable design can make better use of the resources available to it than any fixed implementation, and provide large improvements in both performance and energy consumption. Reconfigurability becomes increasingly useful as r… ▽ More The memory system of a modern embedded processor consumes a large fraction of total system energy. We explore a range of different configuration options and show that a reconfigurable design can make better use of the resources available to it than any fixed implementation, and provide large improvements in both performance and energy consumption. Reconfigurability becomes increasingly useful as resources become more constrained, so is particularly relevant in the embedded space. For an optimised architectural configuration, we show that a configurable cache system performs an average of 20% (maximum 70%) better than the best fixed implementation when two programs are competing for the same resources, and reduces cache miss rate by an average of 70% (maximum 90%). We then present a case study of AES encryption and decryption, and find that a custom memory configuration can almost double performance, with further benefits being achieved by specialising the task of each core when parallelising the program. △ Less

Submitted 7 January, 2016; v1 submitted 5 January, 2016; originally announced January 2016.

Comments: Presented at HIP3ES, 2016

Report number: HIP3ES/2016/2

arXiv:1505.05241 [pdf, ps, other]

Software for the Gale transform of fewnomial systems and a Descartes rule for fewnomials

Authors: Daniel J. Bates, Jonathan D. Hauenstein, Matthew E. Niemerg, Frank Sottile

Abstract: We give a Descartes'-like bound on the number of positive solutions to a system of fewnomials that holds when its exponent vectors are not in convex position and a sign condition is satisfied. This was discovered while developing algorithms and software for computing the Gale transform of a fewnomial system, which is our main goal. This software is a component of a package we are developing for Kh… ▽ More We give a Descartes'-like bound on the number of positive solutions to a system of fewnomials that holds when its exponent vectors are not in convex position and a sign condition is satisfied. This was discovered while developing algorithms and software for computing the Gale transform of a fewnomial system, which is our main goal. This software is a component of a package we are developing for Khovanskii-Rolle continuation, which is a numerical algorithm to compute the real solutions to a system of fewnomials. △ Less

Submitted 20 May, 2015; originally announced May 2015.

Comments: 22 pages, 4 figures

MSC Class: 14P99; 65H10; 65H20 ACM Class: G.1.5

arXiv:1310.3297 [pdf, ps, other]

Bertini for Macaulay2

Authors: Daniel J. Bates, Elizabeth Gross, Anton Leykin, Jose Israel Rodriguez

Abstract: Numerical algebraic geometry is the field of computational mathematics concerning the numerical solution of polynomial systems of equations. Bertini, a popular software package for computational applications of this field, includes implementations of a variety of algorithms based on polynomial homotopy continuation. The Macaulay2 package Bertini.m2 provides an interface to Bertini, making it possi… ▽ More Numerical algebraic geometry is the field of computational mathematics concerning the numerical solution of polynomial systems of equations. Bertini, a popular software package for computational applications of this field, includes implementations of a variety of algorithms based on polynomial homotopy continuation. The Macaulay2 package Bertini.m2 provides an interface to Bertini, making it possible to access the core run modes of Bertini in Macaulay2. With these run modes, users can find approximate solutions to zero-dimensional systems and positive-dimensional systems, test numerically whether a point lies on a variety, sample numerically from a variety, and perform parameter homotopy runs. △ Less

Submitted 11 October, 2013; originally announced October 2013.

MSC Class: 65H10

Showing 1–16 of 16 results for author: Bates, D