-
Compact Bayesian Neural Networks via pruned MCMC sampling
Authors:
Ratneel Deo,
Scott Sisson,
Jody M. Webster,
Rohitash Chandra
Abstract:
Bayesian Neural Networks (BNNs) offer robust uncertainty quantification in model predictions, but training them presents a significant computational challenge. This is mainly due to the problem of sampling multimodal posterior distributions using Markov Chain Monte Carlo (MCMC) sampling and variational inference algorithms. Moreover, the number of model parameters scales exponentially with additio…
▽ More
Bayesian Neural Networks (BNNs) offer robust uncertainty quantification in model predictions, but training them presents a significant computational challenge. This is mainly due to the problem of sampling multimodal posterior distributions using Markov Chain Monte Carlo (MCMC) sampling and variational inference algorithms. Moreover, the number of model parameters scales exponentially with additional hidden layers, neurons, and features in the dataset. Typically, a significant portion of these densely connected parameters are redundant and pruning a neural network not only improves portability but also has the potential for better generalisation capabilities. In this study, we address some of the challenges by leveraging MCMC sampling with network pruning to obtain compact probabilistic models having removed redundant parameters. We sample the posterior distribution of model parameters (weights and biases) and prune weights with low importance, resulting in a compact model. We ensure that the compact BNN retains its ability to estimate uncertainty via the posterior distribution while retaining the model training and generalisation performance accuracy by adapting post-pruning resampling. We evaluate the effectiveness of our MCMC pruning strategy on selected benchmark datasets for regression and classification problems through empirical result analysis. We also consider two coral reef drill-core lithology classification datasets to test the robustness of the pruning model in complex real-world datasets. We further investigate if refining compact BNN can retain any loss of performance. Our results demonstrate the feasibility of training and pruning BNNs using MCMC whilst retaining generalisation performance with over 75% reduction in network size. This paves the way for developing compact BNN models that provide uncertainty estimates for real-world applications.
△ Less
Submitted 12 January, 2025;
originally announced January 2025.
-
NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text
Authors:
Prajwal Kailas,
Max Homilius,
Rahul C. Deo,
Calum A. MacRae
Abstract:
Accurate diagnostic coding of medical notes is crucial for enhancing patient care, medical research, and error-free billing in healthcare organizations. Manual coding is a time-consuming task for providers, and diagnostic codes often exhibit low sensitivity and specificity, whereas the free text in medical notes can be a more precise description of a patients status. Thus, accurate automated diagn…
▽ More
Accurate diagnostic coding of medical notes is crucial for enhancing patient care, medical research, and error-free billing in healthcare organizations. Manual coding is a time-consuming task for providers, and diagnostic codes often exhibit low sensitivity and specificity, whereas the free text in medical notes can be a more precise description of a patients status. Thus, accurate automated diagnostic coding of medical notes has become critical for a learning healthcare system. Recent developments in long-document transformer architectures have enabled attention-based deep-learning models to adjudicate medical notes. In addition, contrastive loss functions have been used to jointly pre-train large language and image models with noisy labels. To further improve the automated adjudication of medical notes, we developed an approach based on i) models for ICD-10 diagnostic code sequences using a large real-world data set, ii) large language models for medical notes, and iii) contrastive pre-training to build an integrated model of both ICD-10 diagnostic codes and corresponding medical text. We demonstrate that a contrastive approach for pre-training improves performance over prior state-of-the-art models for the MIMIC-III-50, MIMIC-III-rare50, and MIMIC-III-full diagnostic coding tasks.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Application of Quantum Pre-Processing Filter for Binary Image Classification with Small Samples
Authors:
Farina Riaz,
Shahab Abdulla,
Hajime Suzuki,
Srinjoy Ganguly,
Ravinesh C. Deo,
Susan Hopkins
Abstract:
Over the past few years, there has been significant interest in Quantum Machine Learning (QML) among researchers, as it has the potential to transform the field of machine learning. Several models that exploit the properties of quantum mechanics have been developed for practical applications. In this study, we investigated the application of our previously proposed quantum pre-processing filter (Q…
▽ More
Over the past few years, there has been significant interest in Quantum Machine Learning (QML) among researchers, as it has the potential to transform the field of machine learning. Several models that exploit the properties of quantum mechanics have been developed for practical applications. In this study, we investigated the application of our previously proposed quantum pre-processing filter (QPF) to binary image classification. We evaluated the QPF on four datasets: MNIST (handwritten digits), EMNIST (handwritten digits and alphabets), CIFAR-10 (photographic images) and GTSRB (real-life traffic sign images). Similar to our previous multi-class classification results, the application of QPF improved the binary image classification accuracy using neural network against MNIST, EMNIST, and CIFAR-10 from 98.9% to 99.2%, 97.8% to 98.3%, and 71.2% to 76.1%, respectively, but degraded it against GTSRB from 93.5% to 92.0%. We then applied QPF in cases using a smaller number of training and testing samples, i.e. 80 and 20 samples per class, respectively. In order to derive statistically stable results, we conducted the experiment with 100 trials choosing randomly different training and testing samples and averaging the results. The result showed that the application of QPF did not improve the image classification accuracy against MNIST and EMNIST but improved it against CIFAR-10 and GTSRB from 65.8% to 67.2% and 90.5% to 91.8%, respectively. Further research will be conducted as part of future work to investigate the potential of QPF to assess the scalability of the proposed approach to larger and complex datasets.
△ Less
Submitted 16 December, 2024; v1 submitted 28 August, 2023;
originally announced August 2023.
-
Development of a Novel Quantum Pre-processing Filter to Improve Image Classification Accuracy of Neural Network Models
Authors:
Farina Riaz,
Shahab Abdulla,
Hajime Suzuki,
Srinjoy Ganguly,
Ravinesh C. Deo,
Susan Hopkins
Abstract:
This paper proposes a novel quantum pre-processing filter (QPF) to improve the image classification accuracy of neural network (NN) models. A simple four qubit quantum circuit that uses Y rotation gates for encoding and two controlled NOT gates for creating correlation among the qubits is applied as a feature extraction filter prior to passing data into the fully connected NN architecture. By appl…
▽ More
This paper proposes a novel quantum pre-processing filter (QPF) to improve the image classification accuracy of neural network (NN) models. A simple four qubit quantum circuit that uses Y rotation gates for encoding and two controlled NOT gates for creating correlation among the qubits is applied as a feature extraction filter prior to passing data into the fully connected NN architecture. By applying the QPF approach, the results show that the image classification accuracy based on the MNIST (handwritten 10 digits) and the EMNIST (handwritten 47 class digits and letters) datasets can be improved, from 92.5% to 95.4% and from 68.9% to 75.9%, respectively. These improvements were obtained without introducing extra model parameters or optimizations in the machine learning process. However, tests performed on the developed QPF approach against a relatively complex GTSRB dataset with 43 distinct class real-life traffic sign images showed a degradation in the classification accuracy. Considering this result, further research into the understanding and the design of a more suitable quantum circuit approach for image classification neural networks could be explored utilizing the baseline method proposed in this paper.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Evaluation of Non-Collocated Force Feedback Driven by Signal-Independent Noise
Authors:
Zonghe Chua,
Allison M. Okamura,
Darrel R. Deo
Abstract:
Individuals living with paralysis or amputation can operate robotic prostheses using input signals based on their intent or attempt to move. Because sensory function is lost or diminished in these individuals, haptic feedback must be non-collocated. The intracortical brain computer interface (iBCI) has enabled a variety of neural prostheses for people with paralysis. An important attribute of the…
▽ More
Individuals living with paralysis or amputation can operate robotic prostheses using input signals based on their intent or attempt to move. Because sensory function is lost or diminished in these individuals, haptic feedback must be non-collocated. The intracortical brain computer interface (iBCI) has enabled a variety of neural prostheses for people with paralysis. An important attribute of the iBCI is that its input signal contains signal-independent noise. To understand the effects of signal-independent noise on a system with non-collocated haptic feedback and inform iBCI-based prostheses control strategies, we conducted an experiment with a conventional haptic interface as a proxy for the iBCI. Able-bodied users were tasked with locating an indentation within a virtual environment using input from their right hand. Non-collocated haptic feedback of the interaction forces in the virtual environment was augmented with noise of three different magnitudes and simultaneously rendered on users' left hands. We found increases in distance error of the guess of the indentation location, mean time per trial, mean peak absolute displacement and speed of tool movements during localization for the highest noise level compared to the other two levels. The findings suggest that users have a threshold of disturbance rejection and that they attempt to increase their signal-to-noise ratio through their exploratory actions.
△ Less
Submitted 22 May, 2020;
originally announced May 2020.
-
Langevin-gradient parallel tempering for Bayesian neural learning
Authors:
Rohitash Chandra,
Konark Jain,
Ratneel V. Deo,
Sally Cripps
Abstract:
Bayesian neural learning feature a rigorous approach to estimation and uncertainty quantification via the posterior distribution of weights that represent knowledge of the neural network. This not only provides point estimates of optimal set of weights but also the ability to quantify uncertainty in decision making using the posterior distribution. Markov chain Monte Carlo (MCMC) techniques are ty…
▽ More
Bayesian neural learning feature a rigorous approach to estimation and uncertainty quantification via the posterior distribution of weights that represent knowledge of the neural network. This not only provides point estimates of optimal set of weights but also the ability to quantify uncertainty in decision making using the posterior distribution. Markov chain Monte Carlo (MCMC) techniques are typically used to obtain sample-based estimates of the posterior distribution. However, these techniques face challenges in convergence and scalability, particularly in settings with large datasets and network architectures. This paper address these challenges in two ways. First, parallel tempering is used used to explore multiple modes of the posterior distribution and implemented in multi-core computing architecture. Second, we make within-chain sampling schemes more efficient by using Langevin gradient information in forming Metropolis-Hastings proposal distributions. We demonstrate the techniques using time series prediction and pattern classification applications. The results show that the method not only improves the computational time, but provides better prediction or decision making capabilities when compared to related methods.
△ Less
Submitted 10 November, 2018;
originally announced November 2018.
-
Automated and Interpretable Patient ECG Profiles for Disease Detection, Tracking, and Discovery
Authors:
Geoffrey H. Tison,
Jeffrey Zhang,
Francesca N. Delling,
Rahul C. Deo
Abstract:
The electrocardiogram or ECG has been in use for over 100 years and remains the most widely performed diagnostic test to characterize cardiac structure and electrical activity. We hypothesized that parallel advances in computing power, innovations in machine learning algorithms, and availability of large-scale digitized ECG data would enable extending the utility of the ECG beyond its current limi…
▽ More
The electrocardiogram or ECG has been in use for over 100 years and remains the most widely performed diagnostic test to characterize cardiac structure and electrical activity. We hypothesized that parallel advances in computing power, innovations in machine learning algorithms, and availability of large-scale digitized ECG data would enable extending the utility of the ECG beyond its current limitations, while at the same time preserving interpretability, which is fundamental to medical decision-making. We identified 36,186 ECGs from the UCSF database that were 1) in normal sinus rhythm and 2) would enable training of specific models for estimation of cardiac structure or function or detection of disease. We derived a novel model for ECG segmentation using convolutional neural networks (CNN) and Hidden Markov Models (HMM) and evaluated its output by comparing electrical interval estimates to 141,864 measurements from the clinical workflow. We built a 725-element patient-level ECG profile using downsampled segmentation data and trained machine learning models to estimate left ventricular mass, left atrial volume, mitral annulus e' and to detect and track four diseases: pulmonary arterial hypertension (PAH), hypertrophic cardiomyopathy (HCM), cardiac amyloid (CA), and mitral valve prolapse (MVP). CNN-HMM derived ECG segmentation agreed with clinical estimates, with median absolute deviations (MAD) as a fraction of observed value of 0.6% for heart rate and 4% for QT interval. Patient-level ECG profiles enabled quantitative estimates of left ventricular and mitral annulus e' velocity with good discrimination in binary classification models of left ventricular hypertrophy and diastolic function. Models for disease detection ranged from AUROC of 0.94 to 0.77 for MVP. Top-ranked variables for all models included known ECG characteristics along with novel predictors of these traits/diseases.
△ Less
Submitted 6 July, 2018;
originally announced July 2018.
-
Multi-core parallel tempering Bayeslands for basin and landscape evolution
Authors:
Rohitash Chandra,
R. Dietmar Müller,
Danial Azam,
Ratneel Deo,
Nathaniel Butterworth,
Tristan Salles,
Sally Cripps
Abstract:
The Bayesian paradigm is becoming an increasingly popular framework for estimation and uncertainty quantification of unknown parameters in geo-physical inversion problems. Badlands is a basin and landscape evolution forward model for simulating topography evolution at a large range of spatial and time scales. Our previous work presented Bayeslands that used the Bayesian paradigm to make inference…
▽ More
The Bayesian paradigm is becoming an increasingly popular framework for estimation and uncertainty quantification of unknown parameters in geo-physical inversion problems. Badlands is a basin and landscape evolution forward model for simulating topography evolution at a large range of spatial and time scales. Our previous work presented Bayeslands that used the Bayesian paradigm to make inference for unknown parameters in the Badlands model using Markov chain Monte Carlo (MCMC) sampling. Bayeslands faced challenges in convergence due to multi-modal posterior distributions in the selected parameters of Badlands. Parallel tempering is an advanced MCMC method suited for irregular and multi-modal posterior distributions. In this paper, we extend Bayeslands using parallel tempering (PT-Bayeslands) with high performance computing to address previous limitations in parameter space exploration in the context of the computationally expensive Badlands model. Our results show that PT-Bayeslands not only reduces the computation time, but also provides an improvement of the sampling for multi-modal posterior distributions. This provides an improvement over Bayeslands which used single chain MCMC that face difficulties in convergence and can lead to misleading inference. This motivates its usage in large-scale basin and landscape evolution models.
△ Less
Submitted 19 July, 2019; v1 submitted 22 June, 2018;
originally announced June 2018.
-
Stacked transfer learning for tropical cyclone intensity prediction
Authors:
Ratneel Vikash Deo,
Rohitash Chandra,
Anuraganand Sharma
Abstract:
Tropical cyclone wind-intensity prediction is a challenging task considering drastic changes climate patterns over the last few decades. In order to develop robust prediction models, one needs to consider different characteristics of cyclones in terms of spatial and temporal characteristics. Transfer learning incorporates knowledge from a related source dataset to compliment a target datasets espe…
▽ More
Tropical cyclone wind-intensity prediction is a challenging task considering drastic changes climate patterns over the last few decades. In order to develop robust prediction models, one needs to consider different characteristics of cyclones in terms of spatial and temporal characteristics. Transfer learning incorporates knowledge from a related source dataset to compliment a target datasets especially in cases where there is lack or data. Stacking is a form of ensemble learning focused for improving generalization that has been recently used for transfer learning problems which is referred to as transfer stacking. In this paper, we employ transfer stacking as a means of studying the effects of cyclones whereby we evaluate if cyclones in different geographic locations can be helpful for improving generalization performs. Moreover, we use conventional neural networks for evaluating the effects of duration on cyclones in prediction performance. Therefore, we develop an effective strategy that evaluates the relationships between different types of cyclones through transfer learning and conventional learning methods via neural networks.
△ Less
Submitted 22 August, 2017;
originally announced August 2017.
-
A Computer Vision Pipeline for Automated Determination of Cardiac Structure and Function and Detection of Disease by Two-Dimensional Echocardiography
Authors:
Jeffrey Zhang,
Sravani Gajjala,
Pulkit Agrawal,
Geoffrey H. Tison,
Laura A. Hallock,
Lauren Beussink-Nelson,
Eugene Fan,
Mandar A. Aras,
ChaRandle Jordan,
Kirsten E. Fleischmann,
Michelle Melisko,
Atif Qasim,
Alexei Efros,
Sanjiv J. Shah,
Ruzena Bajcsy,
Rahul C. Deo
Abstract:
Automated cardiac image interpretation has the potential to transform clinical practice in multiple ways including enabling low-cost serial assessment of cardiac function in the primary care and rural setting. We hypothesized that advances in computer vision could enable building a fully automated, scalable analysis pipeline for echocardiogram (echo) interpretation. Our approach entailed: 1) prepr…
▽ More
Automated cardiac image interpretation has the potential to transform clinical practice in multiple ways including enabling low-cost serial assessment of cardiac function in the primary care and rural setting. We hypothesized that advances in computer vision could enable building a fully automated, scalable analysis pipeline for echocardiogram (echo) interpretation. Our approach entailed: 1) preprocessing; 2) convolutional neural networks (CNN) for view identification, image segmentation, and phasing of the cardiac cycle; 3) quantification of chamber volumes and left ventricular mass; 4) particle tracking to compute longitudinal strain; and 5) targeted disease detection. CNNs accurately identified views (e.g. 99% for apical 4-chamber) and segmented individual cardiac chambers. Cardiac structure measurements agreed with study report values (e.g. mean absolute deviations (MAD) of 7.7 mL/kg/m2 for left ventricular diastolic volume index, 2918 studies). We computed automated ejection fraction and longitudinal strain measurements (within 2 cohorts), which agreed with commercial software-derived values [for ejection fraction, MAD=5.3%, N=3101 studies; for strain, MAD=1.5% (n=197) and 1.6% (n=110)], and demonstrated applicability to serial monitoring of breast cancer patients for trastuzumab cardiotoxicity. Overall, we found that, compared to manual measurements, automated measurements had superior performance across seven internal consistency metrics with an average increase in the Spearman correlation coefficient of 0.05 (p=0.02). Finally, we developed disease detection algorithms for hypertrophic cardiomyopathy and cardiac amyloidosis, with C-statistics of 0.93 and 0.84, respectively. Our pipeline lays the groundwork for using automated interpretation to support point-of-care handheld cardiac ultrasound and large-scale analysis of the millions of echos archived within healthcare systems.
△ Less
Submitted 12 January, 2018; v1 submitted 22 June, 2017;
originally announced June 2017.