-
Linking convolutional kernel size to generalization bias in face analysis CNNs
Authors:
Hao Liang,
Josue Ortega Caro,
Vikram Maheshri,
Ankit B. Patel,
Guha Balakrishnan
Abstract:
Training dataset biases are by far the most scrutinized factors when explaining algorithmic biases of neural networks. In contrast, hyperparameters related to the neural network architecture have largely been ignored even though different network parameterizations are known to induce different implicit biases over learned features. For example, convolutional kernel size is known to affect the freq…
▽ More
Training dataset biases are by far the most scrutinized factors when explaining algorithmic biases of neural networks. In contrast, hyperparameters related to the neural network architecture have largely been ignored even though different network parameterizations are known to induce different implicit biases over learned features. For example, convolutional kernel size is known to affect the frequency content of features learned in CNNs. In this work, we present a causal framework for linking an architectural hyperparameter to out-of-distribution algorithmic bias. Our framework is experimental, in that we train several versions of a network with an intervention to a specific hyperparameter, and measure the resulting causal effect of this choice on performance bias when a particular out-of-distribution image perturbation is applied. In our experiments, we focused on measuring the causal relationship between convolutional kernel size and face analysis classification bias across different subpopulations (race/gender), with respect to high-frequency image details. We show that modifying kernel size, even in one layer of a CNN, changes the frequency content of learned features significantly across data subgroups leading to biased generalization performance even in the presence of a balanced dataset.
△ Less
Submitted 3 December, 2023; v1 submitted 7 February, 2023;
originally announced February 2023.
-
Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic
Authors:
Wesley A. Suttle,
Amrit Singh Bedi,
Bhrij Patel,
Brian M. Sadler,
Alec Koppel,
Dinesh Manocha
Abstract:
Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection. Unfortunately, this assumption is violated for large state spaces or settings with sparse rewards, and the mixing time is unknown, mak…
▽ More
Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection. Unfortunately, this assumption is violated for large state spaces or settings with sparse rewards, and the mixing time is unknown, making the step size inoperable. In this work, we propose an RL methodology attuned to the mixing time by employing a multi-level Monte Carlo estimator for the critic, the actor, and the average reward embedded within an actor-critic (AC) algorithm. This method, which we call \textbf{M}ulti-level \textbf{A}ctor-\textbf{C}ritic (MAC), is developed especially for infinite-horizon average-reward settings and neither relies on oracle knowledge of the mixing time in its parameter selection nor assumes its exponential decay; it, therefore, is readily applicable to applications with slower mixing times. Nonetheless, it achieves a convergence rate comparable to the state-of-the-art AC algorithms. We experimentally show that these alleviated restrictions on the technical conditions required for stability translate to superior performance in practice for RL problems with sparse rewards.
△ Less
Submitted 1 February, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
Uncertainty Disentanglement with Non-stationary Heteroscedastic Gaussian Processes for Active Learning
Authors:
Zeel B Patel,
Nipun Batra,
Kevin Murphy
Abstract:
Gaussian processes are Bayesian non-parametric models used in many areas. In this work, we propose a Non-stationary Heteroscedastic Gaussian process model which can be learned with gradient-based techniques. We demonstrate the interpretability of the proposed model by separating the overall uncertainty into aleatoric (irreducible) and epistemic (model) uncertainty. We illustrate the usability of d…
▽ More
Gaussian processes are Bayesian non-parametric models used in many areas. In this work, we propose a Non-stationary Heteroscedastic Gaussian process model which can be learned with gradient-based techniques. We demonstrate the interpretability of the proposed model by separating the overall uncertainty into aleatoric (irreducible) and epistemic (model) uncertainty. We illustrate the usability of derived epistemic uncertainty on active learning problems. We demonstrate the efficacy of our model with various ablations on multiple datasets.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
COVID-19: Optimal Allocation of Ventilator Supply under Uncertainty and Risk
Authors:
Xuecheng Yin,
I. Esra Buyuktahtakin,
Bhumi P. Patel
Abstract:
This study presents a new risk-averse multi-stage stochastic epidemics-ventilator-logistics compartmental model to address the resource allocation challenges of mitigating COVID-19. This epidemiological logistics model involves the uncertainty of untested asymptomatic infections and incorporates short-term human migration. Disease transmission is also forecasted through a new formulation of transm…
▽ More
This study presents a new risk-averse multi-stage stochastic epidemics-ventilator-logistics compartmental model to address the resource allocation challenges of mitigating COVID-19. This epidemiological logistics model involves the uncertainty of untested asymptomatic infections and incorporates short-term human migration. Disease transmission is also forecasted through a new formulation of transmission rates that evolve over space and time with respect to various non-pharmaceutical interventions, such as wearing masks, social distancing, and lockdown. The proposed multi-stage stochastic model overviews different scenarios on the number of asymptomatic individuals while optimizing the distribution of resources, such as ventilators, to minimize the total expected number of newly infected and deceased people. The Conditional Value at Risk (CVaR) is also incorporated into the multi-stage mean-risk model to allow for a trade-off between the weighted expected loss due to the outbreak and the expected risks associated with experiencing disastrous pandemic scenarios. We apply our multi-stage mean-risk epidemics-ventilator-logistics model to the case of controlling the COVID-19 in highly-impacted counties of New York and New Jersey. We calibrate, validate, and test our model using actual infection, population, and migration data. The results indicate that short-term migration influences the transmission of the disease significantly. The optimal number of ventilators allocated to each region depends on various factors, including the number of initial infections, disease transmission rates, initial ICU capacity, the population of a geographical location, and the availability of ventilator supply. Our data-driven modeling framework can be adapted to study the disease transmission dynamics and logistics of other similar epidemics and pandemics.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Real-time Prediction for Mechanical Ventilation in COVID-19 Patients using A Multi-task Gaussian Process Multi-objective Self-attention Network
Authors:
Kai Zhang,
Siddharth Karanth,
Bela Patel,
Robert Murphy,
Xiaoqian Jiang
Abstract:
We propose a robust in-time predictor for in-hospital COVID-19 patient's probability of requiring mechanical ventilation. A challenge in the risk prediction for COVID-19 patients lies in the great variability and irregular sampling of patient's vitals and labs observed in the clinical setting. Existing methods have strong limitations in handling time-dependent features' complex dynamics, either ov…
▽ More
We propose a robust in-time predictor for in-hospital COVID-19 patient's probability of requiring mechanical ventilation. A challenge in the risk prediction for COVID-19 patients lies in the great variability and irregular sampling of patient's vitals and labs observed in the clinical setting. Existing methods have strong limitations in handling time-dependent features' complex dynamics, either oversimplifying temporal data with summary statistics that lose information or over-engineering features that lead to less robust outcomes. We propose a novel in-time risk trajectory predictive model to handle the irregular sampling rate in the data, which follows the dynamics of risk of performing mechanical ventilation for individual patients. The model incorporates the Multi-task Gaussian Process using observed values to learn the posterior joint multi-variant conditional probability and infer the missing values on a unified time grid. The temporal imputed data is fed into a multi-objective self-attention network for the prediction task. A novel positional encoding layer is proposed and added to the network for producing in-time predictions. The positional layer outputs a risk score at each user-defined time point during the entire hospital stay of an inpatient. We frame the prediction task into a multi-objective learning framework, and the risk scores at all time points are optimized altogether, which adds robustness and consistency to the risk score trajectory prediction. Our experimental evaluation on a large database with nationwide in-hospital patients with COVID-19 also demonstrates that it improved the state-of-the-art performance in terms of AUC (Area Under the receiver operating characteristic Curve) and AUPRC (Area Under the Precision-Recall Curve) performance metrics, especially at early times after hospital admission.
△ Less
Submitted 1 February, 2021;
originally announced February 2021.
-
An Improved Semi-Supervised VAE for Learning Disentangled Representations
Authors:
Weili Nie,
Zichao Wang,
Ankit B. Patel,
Richard G. Baraniuk
Abstract:
Learning interpretable and disentangled representations is a crucial yet challenging task in representation learning. In this work, we focus on semi-supervised disentanglement learning and extend work by Locatello et al. (2019) by introducing another source of supervision that we denote as label replacement. Specifically, during training, we replace the inferred representation associated with a da…
▽ More
Learning interpretable and disentangled representations is a crucial yet challenging task in representation learning. In this work, we focus on semi-supervised disentanglement learning and extend work by Locatello et al. (2019) by introducing another source of supervision that we denote as label replacement. Specifically, during training, we replace the inferred representation associated with a data point with its ground-truth representation whenever it is available. Our extension is theoretically inspired by our proposed general framework of semi-supervised disentanglement learning in the context of VAEs which naturally motivates the supervised terms commonly used in existing semi-supervised VAEs (but not for disentanglement learning). Extensive experiments on synthetic and real datasets demonstrate both quantitatively and qualitatively the ability of our extension to significantly and consistently improve disentanglement with very limited supervision.
△ Less
Submitted 22 June, 2020; v1 submitted 12 June, 2020;
originally announced June 2020.
-
In Pursuit of Interpretable, Fair and Accurate Machine Learning for Criminal Recidivism Prediction
Authors:
Caroline Wang,
Bin Han,
Bhrij Patel,
Cynthia Rudin
Abstract:
Objectives: We study interpretable recidivism prediction using machine learning (ML) models and analyze performance in terms of prediction ability, sparsity, and fairness. Unlike previous works, this study trains interpretable models that output probabilities rather than binary predictions, and uses quantitative fairness definitions to assess the models. This study also examines whether models can…
▽ More
Objectives: We study interpretable recidivism prediction using machine learning (ML) models and analyze performance in terms of prediction ability, sparsity, and fairness. Unlike previous works, this study trains interpretable models that output probabilities rather than binary predictions, and uses quantitative fairness definitions to assess the models. This study also examines whether models can generalize across geographic locations. Methods: We generated black-box and interpretable ML models on two different criminal recidivism datasets from Florida and Kentucky. We compared predictive performance and fairness of these models against two methods that are currently used in the justice system to predict pretrial recidivism: the Arnold PSA and COMPAS. We evaluated predictive performance of all models on predicting six different types of crime over two time spans. Results: Several interpretable ML models can predict recidivism as well as black-box ML models and are more accurate than COMPAS or the Arnold PSA. These models are potentially useful in practice. Similar to the Arnold PSA, some of these interpretable models can be written down as a simple table. Others can be displayed using a set of visualizations. Our geographic analysis indicates that ML models should be trained separately for separate locations and updated over time. We also present a fairness analysis for the interpretable models. Conclusions: Interpretable machine learning models can perform just as well as non-interpretable methods and currently-used risk assessment scales, in terms of both prediction accuracy and fairness. Machine learning models might be more accurate when trained separately for distinct locations and kept up-to-date.
△ Less
Submitted 11 March, 2022; v1 submitted 8 May, 2020;
originally announced May 2020.
-
Assessing Robustness to Noise: Low-Cost Head CT Triage
Authors:
Sarah M. Hooper,
Jared A. Dunnmon,
Matthew P. Lungren,
Sanjiv Sam Gambhir,
Christopher RĂ©,
Adam S. Wang,
Bhavik N. Patel
Abstract:
Automated medical image classification with convolutional neural networks (CNNs) has great potential to impact healthcare, particularly in resource-constrained healthcare systems where fewer trained radiologists are available. However, little is known about how well a trained CNN can perform on images with the increased noise levels, different acquisition protocols, or additional artifacts that ma…
▽ More
Automated medical image classification with convolutional neural networks (CNNs) has great potential to impact healthcare, particularly in resource-constrained healthcare systems where fewer trained radiologists are available. However, little is known about how well a trained CNN can perform on images with the increased noise levels, different acquisition protocols, or additional artifacts that may arise when using low-cost scanners, which can be underrepresented in datasets collected from well-funded hospitals. In this work, we investigate how a model trained to triage head computed tomography (CT) scans performs on images acquired with reduced x-ray tube current, fewer projections per gantry rotation, and limited angle scans. These changes can reduce the cost of the scanner and demands on electrical power but come at the expense of increased image noise and artifacts. We first develop a model to triage head CTs and report an area under the receiver operating characteristic curve (AUROC) of 0.77. We then show that the trained model is robust to reduced tube current and fewer projections, with the AUROC dropping only 0.65% for images acquired with a 16x reduction in tube current and 0.22% for images acquired with 8x fewer projections. Finally, for significantly degraded images acquired by a limited angle scan, we show that a model trained specifically to classify such images can overcome the technological limitations to reconstruction and maintain an AUROC within 0.09% of the original model.
△ Less
Submitted 28 March, 2020; v1 submitted 17 March, 2020;
originally announced March 2020.
-
Montage based 3D Medical Image Retrieval from Traumatic Brain Injury Cohort using Deep Convolutional Neural Network
Authors:
Cailey I. Kerley,
Yuankai Huo,
Shikha Chaganti,
Shunxing Bao,
Mayur B. Patel,
Bennett A. Landman
Abstract:
Brain imaging analysis on clinically acquired computed tomography (CT) is essential for the diagnosis, risk prediction of progression, and treatment of the structural phenotypes of traumatic brain injury (TBI). However, in real clinical imaging scenarios, entire body CT images (e.g., neck, abdomen, chest, pelvis) are typically captured along with whole brain CT scans. For instance, in a typical sa…
▽ More
Brain imaging analysis on clinically acquired computed tomography (CT) is essential for the diagnosis, risk prediction of progression, and treatment of the structural phenotypes of traumatic brain injury (TBI). However, in real clinical imaging scenarios, entire body CT images (e.g., neck, abdomen, chest, pelvis) are typically captured along with whole brain CT scans. For instance, in a typical sample of clinical TBI imaging cohort, only ~15% of CT scans actually contain whole brain CT images suitable for volumetric brain analyses; the remaining are partial brain or non-brain images. Therefore, a manual image retrieval process is typically required to isolate the whole brain CT scans from the entire cohort. However, the manual image retrieval is time and resource consuming and even more difficult for the larger cohorts. To alleviate the manual efforts, in this paper we propose an automated 3D medical image retrieval pipeline, called deep montage-based image retrieval (dMIR), which performs classification on 2D montage images via a deep convolutional neural network. The novelty of the proposed method for image processing is to characterize the medical image retrieval task based on the montage images. In a cohort of 2000 clinically acquired TBI scans, 794 scans were used as training data, 206 scans were used as validation data, and the remaining 1000 scans were used as testing data. The proposed achieved accuracy=1.0, recall=1.0, precision=1.0, f1=1.0 for validation data, while achieved accuracy=0.988, recall=0.962, precision=0.962, f1=0.962 for testing data. Thus, the proposed dMIR is able to perform accurate CT whole brain image retrieval from large-scale clinical cohorts.
△ Less
Submitted 10 December, 2018;
originally announced December 2018.
-
Semi-Supervised Learning with the Deep Rendering Mixture Model
Authors:
Tan Nguyen,
Wanjia Liu,
Ethan Perez,
Richard G. Baraniuk,
Ankit B. Patel
Abstract:
Semi-supervised learning algorithms reduce the high cost of acquiring labeled training data by using both labeled and unlabeled data during learning. Deep Convolutional Networks (DCNs) have achieved great success in supervised tasks and as such have been widely employed in the semi-supervised learning. In this paper we leverage the recently developed Deep Rendering Mixture Model (DRMM), a probabil…
▽ More
Semi-supervised learning algorithms reduce the high cost of acquiring labeled training data by using both labeled and unlabeled data during learning. Deep Convolutional Networks (DCNs) have achieved great success in supervised tasks and as such have been widely employed in the semi-supervised learning. In this paper we leverage the recently developed Deep Rendering Mixture Model (DRMM), a probabilistic generative model that models latent nuisance variation, and whose inference algorithm yields DCNs. We develop an EM algorithm for the DRMM to learn from both labeled and unlabeled data. Guided by the theory of the DRMM, we introduce a novel non-negativity constraint and a variational inference term. We report state-of-the-art performance on MNIST and SVHN and competitive results on CIFAR10. We also probe deeper into how a DRMM trained in a semi-supervised setting represents latent nuisance variation using synthetically rendered images. Taken together, our work provides a unified framework for supervised, unsupervised, and semi-supervised learning.
△ Less
Submitted 6 December, 2016;
originally announced December 2016.
-
A Probabilistic Framework for Deep Learning
Authors:
Ankit B. Patel,
Tan Nguyen,
Richard G. Baraniuk
Abstract:
We develop a probabilistic framework for deep learning based on the Deep Rendering Mixture Model (DRMM), a new generative probabilistic model that explicitly capture variations in data due to latent task nuisance variables. We demonstrate that max-sum inference in the DRMM yields an algorithm that exactly reproduces the operations in deep convolutional neural networks (DCNs), providing a first pri…
▽ More
We develop a probabilistic framework for deep learning based on the Deep Rendering Mixture Model (DRMM), a new generative probabilistic model that explicitly capture variations in data due to latent task nuisance variables. We demonstrate that max-sum inference in the DRMM yields an algorithm that exactly reproduces the operations in deep convolutional neural networks (DCNs), providing a first principles derivation. Our framework provides new insights into the successes and shortcomings of DCNs as well as a principled route to their improvement. DRMM training via the Expectation-Maximization (EM) algorithm is a powerful alternative to DCN back-propagation, and initial training results are promising. Classification based on the DRMM and other variants outperforms DCNs in supervised digit classification, training 2-3x faster while achieving similar accuracy. Moreover, the DRMM is applicable to semi-supervised and unsupervised learning tasks, achieving results that are state-of-the-art in several categories on the MNIST benchmark and comparable to state of the art on the CIFAR10 benchmark.
△ Less
Submitted 6 December, 2016;
originally announced December 2016.
-
A Deep Learning Approach to Structured Signal Recovery
Authors:
Ali Mousavi,
Ankit B. Patel,
Richard G. Baraniuk
Abstract:
In this paper, we develop a new framework for sensing and recovering structured signals. In contrast to compressive sensing (CS) systems that employ linear measurements, sparse representations, and computationally complex convex/greedy algorithms, we introduce a deep learning framework that supports both linear and mildly nonlinear measurements, that learns a structured representation from trainin…
▽ More
In this paper, we develop a new framework for sensing and recovering structured signals. In contrast to compressive sensing (CS) systems that employ linear measurements, sparse representations, and computationally complex convex/greedy algorithms, we introduce a deep learning framework that supports both linear and mildly nonlinear measurements, that learns a structured representation from training data, and that efficiently computes a signal estimate. In particular, we apply a stacked denoising autoencoder (SDA), as an unsupervised feature learner. SDA enables us to capture statistical dependencies between the different elements of certain signals and improve signal recovery performance as compared to the CS approach.
△ Less
Submitted 17 August, 2015;
originally announced August 2015.
-
A Probabilistic Theory of Deep Learning
Authors:
Ankit B. Patel,
Tan Nguyen,
Richard G. Baraniuk
Abstract:
A grand challenge in machine learning is the development of computational algorithms that match or outperform humans in perceptual inference tasks that are complicated by nuisance variation. For instance, visual object recognition involves the unknown object position, orientation, and scale in object recognition while speech recognition involves the unknown voice pronunciation, pitch, and speed. R…
▽ More
A grand challenge in machine learning is the development of computational algorithms that match or outperform humans in perceptual inference tasks that are complicated by nuisance variation. For instance, visual object recognition involves the unknown object position, orientation, and scale in object recognition while speech recognition involves the unknown voice pronunciation, pitch, and speed. Recently, a new breed of deep learning algorithms have emerged for high-nuisance inference tasks that routinely yield pattern recognition systems with near- or super-human capabilities. But a fundamental question remains: Why do they work? Intuitions abound, but a coherent framework for understanding, analyzing, and synthesizing deep learning architectures has remained elusive. We answer this question by developing a new probabilistic framework for deep learning based on the Deep Rendering Model: a generative probabilistic model that explicitly captures latent nuisance variation. By relaxing the generative model to a discriminative one, we can recover two of the current leading deep learning systems, deep convolutional neural networks and random decision forests, providing insights into their successes and shortcomings, as well as a principled route to their improvement.
△ Less
Submitted 2 April, 2015;
originally announced April 2015.