Search | arXiv e-print repository

Diagnosing failures of fairness transfer across distribution shift in real-world medical settings

Authors: Jessica Schrouff, Natalie Harris, Oluwasanmi Koyejo, Ibrahim Alabdulmohsin, Eva Schnider, Krista Opsahl-Ong, Alex Brown, Subhrajit Roy, Diana Mincu, Christina Chen, Awa Dieng, Yuan Liu, Vivek Natarajan, Alan Karthikesalingam, Katherine Heller, Silvia Chiappa, Alexander D'Amour

Abstract: Diagnosing and mitigating changes in model fairness under distribution shift is an important component of the safe deployment of machine learning in healthcare settings. Importantly, the success of any mitigation strategy strongly depends on the structure of the shift. Despite this, there has been little discussion of how to empirically assess the structure of a distribution shift that one is enco… ▽ More Diagnosing and mitigating changes in model fairness under distribution shift is an important component of the safe deployment of machine learning in healthcare settings. Importantly, the success of any mitigation strategy strongly depends on the structure of the shift. Despite this, there has been little discussion of how to empirically assess the structure of a distribution shift that one is encountering in practice. In this work, we adopt a causal framing to motivate conditional independence tests as a key tool for characterizing distribution shifts. Using our approach in two medical applications, we show that this knowledge can help diagnose failures of fairness transfer, including cases where real-world shifts are more complex than is often assumed in the literature. Based on these results, we discuss potential remedies at each step of the machine learning pipeline. △ Less

Submitted 10 February, 2023; v1 submitted 2 February, 2022; originally announced February 2022.

Journal ref: Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

arXiv:2106.15980 [pdf, other]

Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence

Authors: Ghassen Jerfel, Serena Wang, Clara Fannjiang, Katherine A. Heller, Yian Ma, Michael I. Jordan

Abstract: Variational Inference (VI) is a popular alternative to asymptotically exact sampling in Bayesian inference. Its main workhorse is optimization over a reverse Kullback-Leibler divergence (RKL), which typically underestimates the tail of the posterior leading to miscalibration and potential degeneracy. Importance sampling (IS), on the other hand, is often used to fine-tune and de-bias the estimates… ▽ More Variational Inference (VI) is a popular alternative to asymptotically exact sampling in Bayesian inference. Its main workhorse is optimization over a reverse Kullback-Leibler divergence (RKL), which typically underestimates the tail of the posterior leading to miscalibration and potential degeneracy. Importance sampling (IS), on the other hand, is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures. The quality of IS crucially depends on the choice of the proposal distribution. Ideally, the proposal distribution has heavier tails than the target, which is rarely achievable by minimizing the RKL. We thus propose a novel combination of optimization and sampling techniques for approximate Bayesian inference by constructing an IS proposal distribution through the minimization of a forward KL (FKL) divergence. This approach guarantees asymptotic consistency and a fast convergence towards both the optimal IS estimator and the optimal variational approximation. We empirically demonstrate on real data that our method is competitive with variational boosting and MCMC. △ Less

Submitted 30 June, 2021; originally announced June 2021.

Comments: Accepted for the 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021)

arXiv:2101.06536 [pdf, other]

Deep Cox Mixtures for Survival Regression

Authors: Chirag Nagpal, Steve Yadlowsky, Negar Rostamzadeh, Katherine Heller

Abstract: Survival analysis is a challenging variation of regression modeling because of the presence of censoring, where the outcome measurement is only partially known, due to, for example, loss to follow up. Such problems come up frequently in medical applications, making survival analysis a key endeavor in biostatistics and machine learning for healthcare, with Cox regression models being amongst the mo… ▽ More Survival analysis is a challenging variation of regression modeling because of the presence of censoring, where the outcome measurement is only partially known, due to, for example, loss to follow up. Such problems come up frequently in medical applications, making survival analysis a key endeavor in biostatistics and machine learning for healthcare, with Cox regression models being amongst the most commonly employed models. We describe a new approach for survival analysis regression models, based on learning mixtures of Cox regressions to model individual survival distributions. We propose an approximation to the Expectation Maximization algorithm for this model that does hard assignments to mixture groups to make optimization efficient. In each group assignment, we fit the hazard ratios within each group using deep neural networks, and the baseline hazard for each mixture component non-parametrically. We perform experiments on multiple real world datasets, and look at the mortality rates of patients across ethnicity and gender. We emphasize the importance of calibration in healthcare settings and demonstrate that our approach outperforms classical and modern survival analysis baselines, both in terms of discriminative performance and calibration, with large gains in performance on the minority demographics. △ Less

Submitted 26 June, 2022; v1 submitted 16 January, 2021; originally announced January 2021.

Comments: Machine Learning for Healthcare Conference, 2021

Journal ref: Proceedings of the 6th Machine Learning for Healthcare Conference, PMLR 149:674-708, 2021

arXiv:2011.03395 [pdf, other]

Underspecification Presents Challenges for Credibility in Modern Machine Learning

Authors: Alexander D'Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D. Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana Mincu, Akinori Mitani, Andrea Montanari, Zachary Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Osborne , et al. (15 additional authors not shown)

Abstract: ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predict… ▽ More ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We show that this problem appears in a wide variety of practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain. △ Less

Submitted 24 November, 2020; v1 submitted 6 November, 2020; originally announced November 2020.

Comments: Updates: Updated statistical analysis in Section 6; Additional citations

arXiv:2005.07186 [pdf, other]

Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

Authors: Michael W. Dusenberry, Ghassen Jerfel, Yeming Wen, Yi-An Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin Tran

Abstract: Bayesian neural networks (BNNs) demonstrate promising success in improving the robustness and uncertainty quantification of modern deep learning. However, they generally struggle with underfitting at scale and parameter efficiency. On the other hand, deep ensembles have emerged as alternatives for uncertainty quantification that, while outperforming BNNs on certain problems, also suffer from effic… ▽ More Bayesian neural networks (BNNs) demonstrate promising success in improving the robustness and uncertainty quantification of modern deep learning. However, they generally struggle with underfitting at scale and parameter efficiency. On the other hand, deep ensembles have emerged as alternatives for uncertainty quantification that, while outperforming BNNs on certain problems, also suffer from efficiency issues. It remains unclear how to combine the strengths of these two approaches and remediate their common issues. To tackle this challenge, we propose a rank-1 parameterization of BNNs, where each weight matrix involves only a distribution on a rank-1 subspace. We also revisit the use of mixture approximate posteriors to capture multiple modes, where unlike typical mixtures, this approach admits a significantly smaller memory increase (e.g., only a 0.4% increase for a ResNet-50 mixture of size 10). We perform a systematic empirical study on the choices of prior, variational posterior, and methods to improve training. For ResNet-50 on ImageNet, Wide ResNet 28-10 on CIFAR-10/100, and an RNN on MIMIC-III, rank-1 BNNs achieve state-of-the-art performance across log-likelihood, accuracy, and calibration on the test sets and out-of-distribution variants. △ Less

Submitted 14 August, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

Comments: Published in the International Conference on Machine Learning (ICML) 2020. Code available at https://github.com/google/edward2

arXiv:1911.05861 [pdf, other]

Federated and Differentially Private Learning for Electronic Health Records

Authors: Stephen R. Pfohl, Andrew M. Dai, Katherine Heller

Abstract: The use of collaborative and decentralized machine learning techniques such as federated learning have the potential to enable the development and deployment of clinical risk predictions models in low-resource settings without requiring sensitive data be shared or stored in a central repository. This process necessitates communication of model weights or updates between collaborating entities, but… ▽ More The use of collaborative and decentralized machine learning techniques such as federated learning have the potential to enable the development and deployment of clinical risk predictions models in low-resource settings without requiring sensitive data be shared or stored in a central repository. This process necessitates communication of model weights or updates between collaborating entities, but it is unclear to what extent patient privacy is compromised as a result. To gain insight into this question, we study the efficacy of centralized versus federated learning in both private and non-private settings. The clinical prediction tasks we consider are the prediction of prolonged length of stay and in-hospital mortality across thirty one hospitals in the eICU Collaborative Research Database. We find that while it is straightforward to apply differentially private stochastic gradient descent to achieve strong privacy bounds when training in a centralized setting, it is considerably more difficult to do so in the federated setting. △ Less

Submitted 13 November, 2019; originally announced November 2019.

Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

arXiv:1906.03842 [pdf, other]

doi 10.1145/3368555.3384457

Analyzing the Role of Model Uncertainty for Electronic Health Records

Authors: Michael W. Dusenberry, Dustin Tran, Edward Choi, Jonas Kemp, Jeremy Nixon, Ghassen Jerfel, Katherine Heller, Andrew M. Dai

Abstract: In medicine, both ethical and monetary costs of incorrect predictions can be significant, and the complexity of the problems often necessitates increasingly complex models. Recent work has shown that changing just the random seed is enough for otherwise well-tuned deep neural networks to vary in their individual predicted probabilities. In light of this, we investigate the role of model uncertaint… ▽ More In medicine, both ethical and monetary costs of incorrect predictions can be significant, and the complexity of the problems often necessitates increasingly complex models. Recent work has shown that changing just the random seed is enough for otherwise well-tuned deep neural networks to vary in their individual predicted probabilities. In light of this, we investigate the role of model uncertainty methods in the medical domain. Using RNN ensembles and various Bayesian RNNs, we show that population-level metrics, such as AUC-PR, AUC-ROC, log-likelihood, and calibration error, do not capture model uncertainty. Meanwhile, the presence of significant variability in patient-specific predictions and optimal decisions motivates the need for capturing model uncertainty. Understanding the uncertainty for individual patients is an area with clear clinical impact, such as determining when a model decision is likely to be brittle. We further show that RNNs with only Bayesian embeddings can be a more efficient way to capture model uncertainty compared to ensembles, and we analyze how model uncertainty is impacted across individual input features and patient subgroups. △ Less

Submitted 25 March, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

Comments: Published in the ACM Conference on Health, Inference, and Learning (CHIL) 2020. Code available at https://github.com/Google-Health/records-research

arXiv:1812.06080 [pdf, other]

Reconciling meta-learning and continual learning with online mixtures of tasks

Authors: Ghassen Jerfel, Erin Grant, Thomas L. Griffiths, Katherine Heller

Abstract: Learning-to-learn or meta-learning leverages data-driven inductive bias to increase the efficiency of learning on a novel task. This approach encounters difficulty when transfer is not advantageous, for instance, when tasks are considerably dissimilar or change over time. We use the connection between gradient-based meta-learning and hierarchical Bayes to propose a Dirichlet process mixture of hie… ▽ More Learning-to-learn or meta-learning leverages data-driven inductive bias to increase the efficiency of learning on a novel task. This approach encounters difficulty when transfer is not advantageous, for instance, when tasks are considerably dissimilar or change over time. We use the connection between gradient-based meta-learning and hierarchical Bayes to propose a Dirichlet process mixture of hierarchical Bayesian models over the parameters of an arbitrary parametric model such as a neural network. In contrast to consolidating inductive biases into a single set of hyperparameters, our approach of task-dependent hyperparameter selection better handles latent distribution shift, as demonstrated on a set of evolving, image-based, few-shot learning benchmarks. △ Less

Submitted 19 June, 2019; v1 submitted 14 December, 2018; originally announced December 2018.

Comments: updated experimental results

arXiv:1807.09237 [pdf, other]

Hierarchical infinite factor model for improving the prediction of surgical complications for geriatric patients

Authors: Elizabeth Lorenzi, Ricardo Henao, Katherine Heller

Abstract: We develop a hierarchical infinite latent factor model (HIFM) to appropriately account for the covariance structure across subpopulations in data. We propose a novel Hierarchical Dirichlet Process shrinkage prior on the loadings matrix that flexibly captures the underlying structure of our data across subpopulations while sharing information to improve inference and prediction. The stick-breaking… ▽ More We develop a hierarchical infinite latent factor model (HIFM) to appropriately account for the covariance structure across subpopulations in data. We propose a novel Hierarchical Dirichlet Process shrinkage prior on the loadings matrix that flexibly captures the underlying structure of our data across subpopulations while sharing information to improve inference and prediction. The stick-breaking construction of the prior assumes infinite number of factors and allows for each subpopulation to utilize different subsets of the factor space and select the number of factors needed to best explain the variation. Theoretical results are provided to show support of the prior. We develop the model into a latent factor regression method that excels at prediction and inference of regression coefficients. Simulations are used to validate this strong performance compared to baseline methods. We apply this work to the problem of predicting surgical complications using electronic health record data for geriatric patients at Duke University Health System (DUHS). We utilize additional surgical encounters at DUHS to enhance learning for the targeted patients. Using HIFM to identify high risk patients improves the sensitivity of predicting death to 91% from 35% based on the currently used heuristic. △ Less

Submitted 24 July, 2018; originally announced July 2018.

arXiv:1806.06696 [pdf, other]

SMOGS: Social Network Metrics of Game Success

Authors: Fan Bu, Sonia Xu, Katherine Heller, Alexander Volfovsky

Abstract: This paper develops metrics from a social network perspective that are directly translatable to the outcome of a basketball game. We extend a state-of-the-art multi-resolution stochastic process approach to modeling basketball by modeling passes between teammates as directed dynamic relational links on a network and introduce multiplicative latent factors to study higher-order patterns in players'… ▽ More This paper develops metrics from a social network perspective that are directly translatable to the outcome of a basketball game. We extend a state-of-the-art multi-resolution stochastic process approach to modeling basketball by modeling passes between teammates as directed dynamic relational links on a network and introduce multiplicative latent factors to study higher-order patterns in players' interactions that distinguish a successful game from a loss. Parameters are estimated using a Markov chain Monte Carlo sampler. Results in simulation experiments suggest that the sampling scheme is effective in recovering the parameters. We then apply the model to the first high-resolution optical tracking dataset collected in college basketball games. The learned latent factors demonstrate significant differences between players' passing and receiving tendencies in a loss than those in a win. The model is applicable to team sports other than basketball, as well as other time-varying network observations. △ Less

Submitted 18 June, 2018; originally announced June 2018.

Journal ref: PMLR 2019 89:2406-2414

arXiv:1708.05894 [pdf, other]

An Improved Multi-Output Gaussian Process RNN with Real-Time Validation for Early Sepsis Detection

Authors: Joseph Futoma, Sanjay Hariharan, Mark Sendak, Nathan Brajer, Meredith Clement, Armando Bedoya, Cara O'Brien, Katherine Heller

Abstract: Sepsis is a poorly understood and potentially life-threatening complication that can occur as a result of infection. Early detection and treatment improves patient outcomes, and as such it poses an important challenge in medicine. In this work, we develop a flexible classifier that leverages streaming lab results, vitals, and medications to predict sepsis before it occurs. We model patient clinica… ▽ More Sepsis is a poorly understood and potentially life-threatening complication that can occur as a result of infection. Early detection and treatment improves patient outcomes, and as such it poses an important challenge in medicine. In this work, we develop a flexible classifier that leverages streaming lab results, vitals, and medications to predict sepsis before it occurs. We model patient clinical time series with multi-output Gaussian processes, maintaining uncertainty about the physiological state of a patient while also imputing missing values. The mean function takes into account the effects of medications administered on the trajectories of the physiological variables. Latent function values from the Gaussian process are then fed into a deep recurrent neural network to classify patient encounters as septic or not, and the overall model is trained end-to-end using back-propagation. We train and validate our model on a large dataset of 18 months of heterogeneous inpatient stays from the Duke University Health System, and develop a new "real-time" validation scheme for simulating the performance of our model as it will actually be used. Our proposed method substantially outperforms clinical baselines, and improves on a previous related model for detecting sepsis. Our model's predictions will be displayed in a real-time analytics dashboard to be used by a sepsis rapid response team to help detect and improve treatment of sepsis. △ Less

Submitted 19 August, 2017; originally announced August 2017.

Comments: Presented at Machine Learning for Healthcare 2017, Boston, MA

arXiv:1706.04152 [pdf, other]

Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier

Authors: Joseph Futoma, Sanjay Hariharan, Katherine Heller

Abstract: We present a scalable end-to-end classifier that uses streaming physiological and medication data to accurately predict the onset of sepsis, a life-threatening complication from infections that has high mortality and morbidity. Our proposed framework models the multivariate trajectories of continuous-valued physiological time series using multitask Gaussian processes, seamlessly accounting for the… ▽ More We present a scalable end-to-end classifier that uses streaming physiological and medication data to accurately predict the onset of sepsis, a life-threatening complication from infections that has high mortality and morbidity. Our proposed framework models the multivariate trajectories of continuous-valued physiological time series using multitask Gaussian processes, seamlessly accounting for the high uncertainty, frequent missingness, and irregular sampling rates typically associated with real clinical data. The Gaussian process is directly connected to a black-box classifier that predicts whether a patient will become septic, chosen in our case to be a recurrent neural network to account for the extreme variability in the length of patient encounters. We show how to scale the computations associated with the Gaussian process in a manner so that the entire system can be discriminatively trained end-to-end using backpropagation. In a large cohort of heterogeneous inpatient encounters at our university health system we find that it outperforms several baselines at predicting sepsis, and yields 19.4% and 55.5% improved areas under the Receiver Operating Characteristic and Precision Recall curves as compared to the NEWS score currently used by our hospital. △ Less

Submitted 13 June, 2017; originally announced June 2017.

Comments: Presented at 34th International Conference on Machine Learning (ICML 2017), Sydney, Australia

arXiv:1612.00555 [pdf, other]

Transfer Learning via Latent Factor Modeling to Improve Prediction of Surgical Complications

Authors: Elizabeth C Lorenzi, Zhifei Sun, Erich Huang, Ricardo Henao, Katherine A Heller

Abstract: We aim to create a framework for transfer learning using latent factor models to learn the dependence structure between a larger source dataset and a target dataset. The methodology is motivated by our goal of building a risk-assessment model for surgery patients, using both institutional and national surgical outcomes data. The national surgical outcomes data is collected through NSQIP (National… ▽ More We aim to create a framework for transfer learning using latent factor models to learn the dependence structure between a larger source dataset and a target dataset. The methodology is motivated by our goal of building a risk-assessment model for surgery patients, using both institutional and national surgical outcomes data. The national surgical outcomes data is collected through NSQIP (National Surgery Quality Improvement Program), a database housing almost 4 million patients from over 700 different hospitals. We build a latent factor model with a hierarchical prior on the loadings matrix to appropriately account for the different covariance structure in our data. We extend this model to handle more complex relationships between the populations by deriving a scale mixture formulation using stick-breaking properties. Our model provides a transfer learning framework that utilizes all information from both the source and target data, while modeling the underlying inherent differences between them. △ Less

Submitted 1 December, 2016; originally announced December 2016.

arXiv:1608.04615 [pdf, other]

Scalable Modeling of Multivariate Longitudinal Data for Prediction of Chronic Kidney Disease Progression

Authors: Joseph Futoma, Mark Sendak, C. Blake Cameron, Katherine Heller

Abstract: Prediction of the future trajectory of a disease is an important challenge for personalized medicine and population health management. However, many complex chronic diseases exhibit large degrees of heterogeneity, and furthermore there is not always a single readily available biomarker to quantify disease severity. Even when such a clinical variable exists, there are often additional related bioma… ▽ More Prediction of the future trajectory of a disease is an important challenge for personalized medicine and population health management. However, many complex chronic diseases exhibit large degrees of heterogeneity, and furthermore there is not always a single readily available biomarker to quantify disease severity. Even when such a clinical variable exists, there are often additional related biomarkers routinely measured for patients that may better inform the predictions of their future disease state. To this end, we propose a novel probabilistic generative model for multivariate longitudinal data that captures dependencies between multivariate trajectories. We use a Gaussian process based regression model for each individual trajectory, and build off ideas from latent class models to induce dependence between their mean functions. We fit our method using a scalable variational inference algorithm to a large dataset of longitudinal electronic patient health records, and find that it improves dynamic predictions compared to a recent state of the art method. Our local accountable care organization then uses the model predictions during chart reviews of high risk patients with chronic kidney disease. △ Less

Submitted 16 August, 2016; originally announced August 2016.

Comments: Presented at 2016 Machine Learning and Healthcare Conference (MLHC 2016), Los Angeles, CA

arXiv:1604.07031 [pdf, other]

Predictive Hierarchical Clustering: Learning clusters of CPT codes for improving surgical outcomes

Authors: Elizabeth C. Lorenzi, Stephanie L. Brown, Zhifei Sun, Katherine Heller

Abstract: We develop a novel algorithm, Predictive Hierarchical Clustering (PHC), for agglomerative hierarchical clustering of current procedural terminology (CPT) codes. Our predictive hierarchical clustering aims to cluster subgroups, not individual observations, found within our data, such that the clusters discovered result in optimal performance of a classification model. Therefore, merges are chosen b… ▽ More We develop a novel algorithm, Predictive Hierarchical Clustering (PHC), for agglomerative hierarchical clustering of current procedural terminology (CPT) codes. Our predictive hierarchical clustering aims to cluster subgroups, not individual observations, found within our data, such that the clusters discovered result in optimal performance of a classification model. Therefore, merges are chosen based on a Bayesian hypothesis test, which chooses pairings of the subgroups that result in the best model fit, as measured by held out predictive likelihoods. We place a Dirichlet prior on the probability of merging clusters, allowing us to adjust the size and sparsity of clusters. The motivation is to predict patient-specific surgical outcomes using data from ACS NSQIP (American College of Surgeon's National Surgical Quality Improvement Program). An important predictor of surgical outcomes is the actual surgical procedure performed as described by a CPT code. We use PHC to cluster CPT codes, represented as subgroups, together in a way that enables us to better predict patient-specific outcomes compared to currently used clusters based on clinical judgment. △ Less

Submitted 1 August, 2017; v1 submitted 24 April, 2016; originally announced April 2016.

Comments: Accepted at MLHC 2017 to appear in JMLR

arXiv:1511.04157 [pdf, other]

$k$-means: Fighting against Degeneracy in Sequential Monte Carlo with an Application to Tracking

Authors: Kai Fan, Katherine Heller

Abstract: For regular particle filter algorithm or Sequential Monte Carlo (SMC) methods, the initial weights are traditionally dependent on the proposed distribution, the posterior distribution at the current timestamp in the sampled sequence, and the target is the posterior distribution of the previous timestamp. This is technically correct, but leads to algorithms which usually have practical issues with… ▽ More For regular particle filter algorithm or Sequential Monte Carlo (SMC) methods, the initial weights are traditionally dependent on the proposed distribution, the posterior distribution at the current timestamp in the sampled sequence, and the target is the posterior distribution of the previous timestamp. This is technically correct, but leads to algorithms which usually have practical issues with degeneracy, where all particles eventually collapse onto a single particle. In this paper, we propose and evaluate using $k$ means clustering to attack and even take advantage of this degeneracy. Specifically, we propose a Stochastic SMC algorithm which initializes the set of $k$ means, providing the initial centers chosen from the collapsed particles. To fight against degeneracy, we adjust the regular SMC weights, mediated by cluster proportions, and then correct them to retain the same expectation as before. We experimentally demonstrate that our approach has better performance than vanilla algorithms. △ Less

Submitted 12 November, 2015; originally announced November 2015.

arXiv:1509.02866 [pdf, other]

Fast Second-Order Stochastic Backpropagation for Variational Inference

Authors: Kai Fan, Ziteng Wang, Jeff Beck, James Kwok, Katherine Heller

Abstract: We propose a second-order (Hessian or Hessian-free) based optimization method for variational inference inspired by Gaussian backpropagation, and argue that quasi-Newton optimization can be developed as well. This is accomplished by generalizing the gradient computation in stochastic backpropagation via a reparametrization trick with lower complexity. As an illustrative example, we apply this appr… ▽ More We propose a second-order (Hessian or Hessian-free) based optimization method for variational inference inspired by Gaussian backpropagation, and argue that quasi-Newton optimization can be developed as well. This is accomplished by generalizing the gradient computation in stochastic backpropagation via a reparametrization trick with lower complexity. As an illustrative example, we apply this approach to the problems of Bayesian logistic regression and variational auto-encoder (VAE). Additionally, we compute bounds on the estimator variance of intractable expectations for the family of Lipschitz continuous function. Our method is practical, scalable and model free. We demonstrate our method on several real-world datasets and provide comparisons with other stochastic gradient methods to show substantial enhancement in convergence rates. △ Less

Submitted 28 March, 2017; v1 submitted 9 September, 2015; originally announced September 2015.

Comments: Accepted by NIPS 2015

arXiv:1509.00110 [pdf, other]

Bayesian Models for Heterogeneous Personalized Health Data

Authors: Kai Fan, Allison E. Aiello, Katherine A. Heller

Abstract: The purpose of this study is to leverage modern technology (such as mobile or web apps in Beckman et al. (2014)) to enrich epidemiology data and infer the transmission of disease. Homogeneity related research on population level has been intensively studied in previous work. In contrast, we develop hierarchical Graph-Coupled Hidden Markov Models (hGCHMMs) to simultaneously track the spread of infe… ▽ More The purpose of this study is to leverage modern technology (such as mobile or web apps in Beckman et al. (2014)) to enrich epidemiology data and infer the transmission of disease. Homogeneity related research on population level has been intensively studied in previous work. In contrast, we develop hierarchical Graph-Coupled Hidden Markov Models (hGCHMMs) to simultaneously track the spread of infection in a small cell phone community and capture person-specific infection parameters by leveraging a link prior that incorporates additional covariates. We also reexamine the model evolution of the hGCHMM from simple HMMs and LDA, elucidating additional flexibility and interpretability. Due to the non-conjugacy of sparsely coupled HMMs, we design a new approximate distribution, allowing our approach to be more applicable to other application areas. Additionally, we investigate two common link functions, the beta-exponential prior and sigmoid function, both of which allow the development of a principled Bayesian hierarchical framework for disease transmission. The results of our model allow us to predict the probability of infection for each person on each day, and also to infer personal physical vulnerability and the relevant association with covariates. We demonstrate our approach experimentally on both simulation data and real epidemiological records. △ Less

Submitted 31 August, 2015; originally announced September 2015.

Comments: 35 pages; Heterogeneous Flu Diffusion, Social Networks, Dynamic Bayesian Modeling

arXiv:1506.03164 [pdf, other]

Parallelizing MCMC with Random Partition Trees

Authors: Xiangyu Wang, Fangjian Guo, Katherine A. Heller, David B. Dunson

Abstract: The modern scale of data has brought new challenges to Bayesian inference. In particular, conventional MCMC algorithms are computationally very expensive for large data sets. A promising approach to solve this problem is embarrassingly parallel MCMC (EP-MCMC), which first partitions the data into multiple subsets and runs independent sampling algorithms on each subset. The subset posterior draws a… ▽ More The modern scale of data has brought new challenges to Bayesian inference. In particular, conventional MCMC algorithms are computationally very expensive for large data sets. A promising approach to solve this problem is embarrassingly parallel MCMC (EP-MCMC), which first partitions the data into multiple subsets and runs independent sampling algorithms on each subset. The subset posterior draws are then aggregated via some combining rules to obtain the final approximation. Existing EP-MCMC algorithms are limited by approximation accuracy and difficulty in resampling. In this article, we propose a new EP-MCMC algorithm PART that solves these problems. The new algorithm applies random partition trees to combine the subset posterior draws, which is distribution-free, easy to resample from and can adapt to multiple scales. We provide theoretical justification and extensive experiments illustrating empirical performance. △ Less

Submitted 26 October, 2015; v1 submitted 10 June, 2015; originally announced June 2015.

Comments: 25 pages, 9 figures

arXiv:1411.2674 [pdf, other]

The Bayesian Echo Chamber: Modeling Social Influence via Linguistic Accommodation

Authors: Fangjian Guo, Charles Blundell, Hanna Wallach, Katherine Heller

Abstract: We present the Bayesian Echo Chamber, a new Bayesian generative model for social interaction data. By modeling the evolution of people's language usage over time, this model discovers latent influence relationships between them. Unlike previous work on inferring influence, which has primarily focused on simple temporal dynamics evidenced via turn-taking behavior, our model captures more nuanced in… ▽ More We present the Bayesian Echo Chamber, a new Bayesian generative model for social interaction data. By modeling the evolution of people's language usage over time, this model discovers latent influence relationships between them. Unlike previous work on inferring influence, which has primarily focused on simple temporal dynamics evidenced via turn-taking behavior, our model captures more nuanced influence relationships, evidenced via linguistic accommodation patterns in interaction content. The model, which is based on a discrete analog of the multivariate Hawkes process, permits a fully Bayesian inference algorithm. We validate our model's ability to discover latent influence patterns using transcripts of arguments heard by the US Supreme Court and the movie "12 Angry Men." We showcase our model's capabilities by using it to infer latent influence patterns from Federal Open Market Committee meeting transcripts, demonstrating state-of-the-art performance at uncovering social dynamics in group discussions. △ Less

Submitted 27 January, 2015; v1 submitted 10 November, 2014; originally announced November 2014.

Comments: 14 pages, 7 figures, to appear in AISTATS 2015. Fixed minor formatting issues

arXiv:1210.4864 [pdf]

Graph-Coupled HMMs for Modeling the Spread of Infection

Authors: Wen Dong, Alex Pentland, Katherine A. Heller

Abstract: We develop Graph-Coupled Hidden Markov Models (GCHMMs) for modeling the spread of infectious disease locally within a social network. Unlike most previous research in epidemiology, which typically models the spread of infection at the level of entire populations, we successfully leverage mobile phone data collected from 84 people over an extended period of time to model the spread of infection on… ▽ More We develop Graph-Coupled Hidden Markov Models (GCHMMs) for modeling the spread of infectious disease locally within a social network. Unlike most previous research in epidemiology, which typically models the spread of infection at the level of entire populations, we successfully leverage mobile phone data collected from 84 people over an extended period of time to model the spread of infection on an individual level. Our model, the GCHMM, is an extension of widely-used Coupled Hidden Markov Models (CHMMs), which allow dependencies between state transitions across multiple Hidden Markov Models (HMMs), to situations in which those dependencies are captured through the structure of a graph, or to social networks that may change over time. The benefit of making infection predictions on an individual level is enormous, as it allows people to receive more personalized and relevant health advice. △ Less

Submitted 16 October, 2012; originally announced October 2012.

Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

Report number: UAI-P-2012-PG-227-236

arXiv:1204.0168 [pdf]

doi 10.1007/978-3-642-29047-3_21

Modeling Infection with Multi-agent Dynamics

Authors: Wen Dong, Katherine A. Heller, Alex Sandy Pentland

Abstract: Developing the ability to comprehensively study infections in small populations enables us to improve epidemic models and better advise individuals about potential risks to their health. We currently have a limited understanding of how infections spread within a small population because it has been difficult to closely track an infection within a complete community. The paper presents data closely… ▽ More Developing the ability to comprehensively study infections in small populations enables us to improve epidemic models and better advise individuals about potential risks to their health. We currently have a limited understanding of how infections spread within a small population because it has been difficult to closely track an infection within a complete community. The paper presents data closely tracking the spread of an infection centered on a student dormitory, collected by leveraging the residents' use of cellular phones. The data are based on daily symptom surveys taken over a period of four months and proximity tracking through cellular phones. We demonstrate that using a Bayesian, discrete-time multi-agent model of infection to model real-world symptom reports and proximity tracking records gives us important insights about infec-tions in small populations. △ Less

Submitted 11 October, 2014; v1 submitted 1 April, 2012; originally announced April 2012.

arXiv:1203.3468 [pdf]

Bayesian Rose Trees

Authors: Charles Blundell, Yee Whye Teh, Katherine A. Heller

Abstract: Hierarchical structure is ubiquitous in data across many domains. There are many hierarchical clustering methods, frequently used by domain experts, which strive to discover this structure. However, most of these methods limit discoverable hierarchies to those with binary branching structure. This limitation, while computationally convenient, is often undesirable. In this paper we explore a Bayesi… ▽ More Hierarchical structure is ubiquitous in data across many domains. There are many hierarchical clustering methods, frequently used by domain experts, which strive to discover this structure. However, most of these methods limit discoverable hierarchies to those with binary branching structure. This limitation, while computationally convenient, is often undesirable. In this paper we explore a Bayesian hierarchical clustering algorithm that can produce trees with arbitrary branching structure at each node, known as rose trees. We interpret these trees as mixtures over partitions of a data set, and use a computationally efficient, greedy agglomerative algorithm to find the rose trees which have high marginal likelihood given the data. Lastly, we perform experiments which demonstrate that rose trees are better models of data than the typical binary trees returned by other hierarchical clustering algorithms. △ Less

Submitted 15 March, 2012; originally announced March 2012.

Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

Report number: UAI-P-2010-PG-65-72

arXiv:1106.1157 [pdf, other]

Bayesian and L1 Approaches to Sparse Unsupervised Learning

Authors: Shakir Mohamed, Katherine Heller, Zoubin Ghahramani

Abstract: The use of L1 regularisation for sparse learning has generated immense research interest, with successful application in such diverse areas as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper we find that L1 regularisation often dramatically underperforms in terms of predictive performance when c… ▽ More The use of L1 regularisation for sparse learning has generated immense research interest, with successful application in such diverse areas as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper we find that L1 regularisation often dramatically underperforms in terms of predictive performance when compared with other methods for inferring sparsity. We focus on unsupervised latent variable models, and develop L1 minimising factor models, Bayesian variants of "L1", and Bayesian models with a stronger L0-like sparsity induced through spike-and-slab distributions. These spike-and-slab Bayesian factor models encourage sparsity while accounting for uncertainty in a principled manner and avoiding unnecessary shrinkage of non-zero values. We demonstrate on a number of data sets that in practice spike-and-slab Bayesian methods outperform L1 minimisation, even on a computational budget. We thus highlight the need to re-assess the wide use of L1 methods in sparsity-reliant applications, particularly when we care about generalising to previously unseen data, and provide an alternative that, over many varying conditions, provides improved generalisation performance. △ Less

Submitted 17 August, 2012; v1 submitted 6 June, 2011; originally announced June 2011.

Comments: In Proceedings of the 29th International Conference on Machine Learning (ICML), Edinburgh, Scotland, 2012

arXiv:0912.5193 [pdf, ps, other]

doi 10.1214/09-AOAS321

Ranking relations using analogies in biological and information networks

Authors: Ricardo Silva, Katherine Heller, Zoubin Ghahramani, Edoardo M. Airoldi

Abstract: Analogical reasoning depends fundamentally on the ability to learn and generalize about relations between objects. We develop an approach to relational learning which, given a set of pairs of objects $\mathbf{S}=\{A^{(1)}:B^{(1)},A^{(2)}:B^{(2)},\ldots,A^{(N)}:B ^{(N)}\}$, measures how well other pairs A:B fit in with the set $\mathbf{S}$. Our work addresses the following question: is the relation… ▽ More Analogical reasoning depends fundamentally on the ability to learn and generalize about relations between objects. We develop an approach to relational learning which, given a set of pairs of objects $\mathbf{S}=\{A^{(1)}:B^{(1)},A^{(2)}:B^{(2)},\ldots,A^{(N)}:B ^{(N)}\}$, measures how well other pairs A:B fit in with the set $\mathbf{S}$. Our work addresses the following question: is the relation between objects A and B analogous to those relations found in $\mathbf{S}$? Such questions are particularly relevant in information retrieval, where an investigator might want to search for analogous pairs of objects that match the query set of interest. There are many ways in which objects can be related, making the task of measuring analogies very challenging. Our approach combines a similarity measure on function spaces with Bayesian analysis to produce a ranking. It requires data containing features of the objects of interest and a link matrix specifying which relationships exist; no further attributes of such relationships are necessary. We illustrate the potential of our method on text analysis and information networks. An application on discovering functional interactions between pairs of proteins is discussed in detail, where we show that our approach can work in practice even if a small set of protein pairs is provided. △ Less

Submitted 29 August, 2013; v1 submitted 28 December, 2009; originally announced December 2009.

Comments: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS321

Journal ref: Annals of Applied Statistics 2010, Vol. 4, No. 2, 615-644

arXiv:0801.0461 [pdf, other]

An Alternative Prior Process for Nonparametric Bayesian Clustering

Authors: Hanna M. Wallach, Shane T. Jensen, Lee Dicker, Katherine A. Heller

Abstract: Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit "rich-get-richer" characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering -… ▽ More Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit "rich-get-richer" characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering -- the uniform process -- for applications where the "rich-get-richer" property is undesirable. We also explore the cost of this process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. We compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings. △ Less

Submitted 15 October, 2010; v1 submitted 2 January, 2008; originally announced January 2008.

Journal ref: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS) 2010, JMLR W & CP 9, pp. 892-899

Showing 1–26 of 26 results for author: Heller, K