-
SIMBA -- A Bayesian Decision Framework for the Identification of Optimal Biomarker Subgroups for Cancer Basket Clinical Trials
Authors:
Shijie Yuan,
Jiaxin Liu,
Zhihua Gong,
Xia Qin,
Crystal Qin,
Yuan Ji,
Peter Müller
Abstract:
We consider basket trials in which a biomarker-targeting drug may be efficacious for patients across different disease indications. Patients are enrolled if their cells exhibit some levels of biomarker expression. The threshold level is allowed to vary by indication. The proposed SIMBA method uses a decision framework to identify optimal biomarker subgroups (OBS) defined by an optimal biomarker th…
▽ More
We consider basket trials in which a biomarker-targeting drug may be efficacious for patients across different disease indications. Patients are enrolled if their cells exhibit some levels of biomarker expression. The threshold level is allowed to vary by indication. The proposed SIMBA method uses a decision framework to identify optimal biomarker subgroups (OBS) defined by an optimal biomarker threshold for each indication. The optimality is achieved through minimizing a posterior expected loss that balances estimation accuracy and investigator preference for broadly effective therapeutics. A Bayesian hierarchical model is proposed to adaptively borrow information across indications and enhance the accuracy in the estimation of the OBS. The operating characteristics of SIMBA are assessed via simulations and compared against a simplified version and an existing alternative method, both of which do not borrow information. SIMBA is expected to improve the identification of patient sub-populations that may benefit from a biomarker-driven therapeutics.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Borrowing strength between unaligned binary time-series via Bayesian nonparametric rescaling of Unified Skewed Normal priors
Authors:
Beatrice Cantoni,
Giovanni Poli,
Elizabeth Juarez-Colunga,
Peter Müller
Abstract:
We define a Bayesian semi-parametric model to effectively conduct inference with unaligned longitudinal binary data. The proposed strategy is motivated by data from the Human Epilepsy Project (HEP), which collects seizure occurrence data for epilepsy patients, together with relevant covariates. The model is designed to flexibly accommodate the particular challenges that arise with such data. First…
▽ More
We define a Bayesian semi-parametric model to effectively conduct inference with unaligned longitudinal binary data. The proposed strategy is motivated by data from the Human Epilepsy Project (HEP), which collects seizure occurrence data for epilepsy patients, together with relevant covariates. The model is designed to flexibly accommodate the particular challenges that arise with such data. First, epilepsy data require models that can allow for extensive heterogeneity, across both patients and time. With this regard, state space models offer a flexible, yet still analytically amenable class of models. Nevertheless, seizure time-series might share similar behavioral patterns, such as local prolonged periods of elevated seizure presence, which we refer to as "clumping". Such similarities can be used to share strength across patients and define subgroups. However, due to the lack of alignment, straightforward hierarchical modeling of latent state space parameters is not practicable. To overcome this constraint, we construct a strategy that preserves the flexibility of individual trajectories while also exploiting similarities across individuals to borrow information through a nonparametric prior. On the one hand, heterogeneity is ensured by (almost) subject-specific state-space submodels. On the other, borrowing of information is obtained by introducing a Pitman-Yor prior on group-specific probabilities for patterns of clinical interest. We design a posterior sampling strategy that leverages recent developments of binary state space models using the Unified Skewed Normal family (SUN). The model, which allows the sharing of information across individuals with similar disease traits over time, can more generally be adapted to any setting characterized by unaligned binary longitudinal data.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
Bayesian Density-Density Regression with Application to Cell-Cell Communications
Authors:
Khai Nguyen,
Yang Ni,
Peter Mueller
Abstract:
We introduce a scalable framework for regressing multivariate distributions onto multivariate distributions, motivated by the application of inferring cell-cell communication from population-scale single-cell data. The observed data consist of pairs of multivariate distributions for ligands from one cell type and corresponding receptors from another. For each ordered pair $e=(l,r)$ of cell types…
▽ More
We introduce a scalable framework for regressing multivariate distributions onto multivariate distributions, motivated by the application of inferring cell-cell communication from population-scale single-cell data. The observed data consist of pairs of multivariate distributions for ligands from one cell type and corresponding receptors from another. For each ordered pair $e=(l,r)$ of cell types $(l \neq r)$ and each sample $i = 1, \ldots, n$, we observe a pair of distributions $(F_{ei}, G_{ei})$ of gene expressions for ligands and receptors of cell types $l$ and $r$, respectively. The aim is to set up a regression of receptor distributions $G_{ei}$ given ligand distributions $F_{ei}$. A key challenge is that these distributions reside in distinct spaces of differing dimensions. We formulate the regression of multivariate densities on multivariate densities using a generalized Bayes framework with the sliced Wasserstein distance between fitted and observed distributions. Finally, we use inference under such regressions to define a directed graph for cell-cell communications.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
DPGLM: A Semiparametric Bayesian GLM with Inhomogeneous Normalized Random Measures
Authors:
Entejar Alam,
Paul J. Rathouz,
Peter Mueller
Abstract:
We introduce a novel varying-weight dependent Dirichlet process (DDP) model that extends a recently developed semi-parametric generalized linear model (SPGLM) by adding a nonparametric Bayesian prior on the baseline distribution of the GLM. We show that the resulting model takes the form of an inhomogeneous completely random measure that arises from exponential tilting of a normalized completely r…
▽ More
We introduce a novel varying-weight dependent Dirichlet process (DDP) model that extends a recently developed semi-parametric generalized linear model (SPGLM) by adding a nonparametric Bayesian prior on the baseline distribution of the GLM. We show that the resulting model takes the form of an inhomogeneous completely random measure that arises from exponential tilting of a normalized completely random measure. Building on familiar posterior sampling methods for mixtures with respect to normalized random measures, we introduce posterior simulation in the resulting model. We validate the proposed methodology through extensive simulation studies and illustrate its application using data from a speech intelligibility study.
△ Less
Submitted 28 March, 2025; v1 submitted 24 February, 2025;
originally announced February 2025.
-
Summarizing Bayesian Nonparametric Mixture Posterior -- Sliced Optimal Transport Metrics for Gaussian Mixtures
Authors:
Khai Nguyen,
Peter Mueller
Abstract:
Existing methods to summarize posterior inference for mixture models focus on identifying a point estimate of the implied random partition for clustering, with density estimation as a secondary goal (Wade and Ghahramani, 2018; Dahl et al., 2022). We propose a novel approach for summarizing posterior inference in nonparametric Bayesian mixture models, prioritizing estimation of the mixing measure (…
▽ More
Existing methods to summarize posterior inference for mixture models focus on identifying a point estimate of the implied random partition for clustering, with density estimation as a secondary goal (Wade and Ghahramani, 2018; Dahl et al., 2022). We propose a novel approach for summarizing posterior inference in nonparametric Bayesian mixture models, prioritizing estimation of the mixing measure (or mixture) as an inference target. One of the key features is the model-agnostic nature of the approach, which remains valid under arbitrarily complex dependence structures in the underlying sampling model. Using a decision-theoretic framework, our method identifies a point estimate by minimizing posterior expected loss. A loss function is defined as a discrepancy between mixing measures. Estimating the mixing measure implies inference on the mixture density and the random partition. Exploiting the discrete nature of the mixing measure, we use a version of sliced Wasserstein distance. We introduce two specific variants for Gaussian mixtures. The first, mixed sliced Wasserstein, applies generalized geodesic projections on the product of the Euclidean space and the manifold of symmetric positive definite matrices. The second, sliced mixture Wasserstein, leverages the linearity of Gaussian mixture measures for efficient projection
△ Less
Submitted 7 May, 2025; v1 submitted 21 November, 2024;
originally announced November 2024.
-
In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models
Authors:
Ayrton San Joaquin,
Bin Wang,
Zhengyuan Liu,
Nicholas Asher,
Brian Lim,
Philippe Muller,
Nancy F. Chen
Abstract:
Despite advancements, fine-tuning Large Language Models (LLMs) remains costly due to the extensive parameter count and substantial data requirements for model generalization. Accessibility to computing resources remains a barrier for the open-source community. To address this challenge, we propose the In2Core algorithm, which selects a coreset by analyzing the correlation between training and eval…
▽ More
Despite advancements, fine-tuning Large Language Models (LLMs) remains costly due to the extensive parameter count and substantial data requirements for model generalization. Accessibility to computing resources remains a barrier for the open-source community. To address this challenge, we propose the In2Core algorithm, which selects a coreset by analyzing the correlation between training and evaluation samples with a trained model. Notably, we assess the model's internal gradients to estimate this relationship, aiming to rank the contribution of each training point. To enhance efficiency, we propose an optimization to compute influence functions with a reduced number of layers while achieving similar accuracy. By applying our algorithm to instruction fine-tuning data of LLMs, we can achieve similar performance with just 50% of the training data. Meantime, using influence functions to analyze model coverage to certain testing samples could provide a reliable and interpretable signal on the training set's coverage of those test points.
△ Less
Submitted 2 October, 2024; v1 submitted 7 August, 2024;
originally announced August 2024.
-
Clustering and Meta-Analysis Using a Mixture of Dependent Linear Tail-Free Priors
Authors:
Bernardo Flores,
Peter Mueller
Abstract:
We propose a novel nonparametric Bayesian approach for meta-analysis with event time outcomes. The model is an extension of linear dependent tail-free processes. The extension includes a modification to facilitate (conditionally) conjugate posterior updating and a hierarchical extension with a random partition of studies. The partition is formalized as a Dirichlet process mixture. The model develo…
▽ More
We propose a novel nonparametric Bayesian approach for meta-analysis with event time outcomes. The model is an extension of linear dependent tail-free processes. The extension includes a modification to facilitate (conditionally) conjugate posterior updating and a hierarchical extension with a random partition of studies. The partition is formalized as a Dirichlet process mixture. The model development is motivated by a meta-analysis of cancer immunotherapy studies. The aim is to validate the use of relevant biomarkers in the design of immunotherapy studies. The hypothesis is about immunotherapy in general, rather than about a specific tumor type, therapy and marker. This broad hypothesis leads to a very diverse set of studies being included in the analysis and gives rise to substantial heterogeneity across studies
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Dir-SPGLM: A Bayesian semiparametric GLM with data-driven reference distribution
Authors:
Entejar Alam,
Peter Müller,
Paul J. Rathouz
Abstract:
The recently developed semi-parametric generalized linear model (SPGLM) offers more flexibility as compared to the classical GLM by including the baseline or reference distribution of the response as an additional parameter in the model. However, some inference summaries are not easily generated under existing maximum-likelihood based inference (ML-SPGLM). This includes uncertainty in estimation f…
▽ More
The recently developed semi-parametric generalized linear model (SPGLM) offers more flexibility as compared to the classical GLM by including the baseline or reference distribution of the response as an additional parameter in the model. However, some inference summaries are not easily generated under existing maximum-likelihood based inference (ML-SPGLM). This includes uncertainty in estimation for model-derived functionals such as exceedance probabilities. The latter are critical in a clinical diagnostic or decision-making setting. In this article, by placing a Dirichlet prior on the baseline distribution, we propose a Bayesian model-based approach for inference to address these important gaps. We establish consistency and asymptotic normality results for the implied canonical parameter. Simulation studies and an illustration with data from an aging research study confirm that the proposed method performs comparably or better in comparison with ML-SPGLM. The proposed Bayesian framework is most attractive for inference with small sample training data or in sparse-data scenarios.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
A Multivariate Polya Tree Model for Meta-Analysis with Event Time Distributions
Authors:
Giovanni Poli,
Elena Fountzilas,
Apostolia-Maria Tsimeridou,
Peter Müller
Abstract:
We develop a non-parametric Bayesian prior for a family of random probability measures by extending the Polya tree ($PT$) prior to a joint prior for a set of probability measures $G_1,\dots,G_n$, suitable for meta-analysis with event time outcomes. In the application to meta-analysis $G_i$ is the event time distribution specific to study $i$. The proposed model defines a regression on study-specif…
▽ More
We develop a non-parametric Bayesian prior for a family of random probability measures by extending the Polya tree ($PT$) prior to a joint prior for a set of probability measures $G_1,\dots,G_n$, suitable for meta-analysis with event time outcomes. In the application to meta-analysis $G_i$ is the event time distribution specific to study $i$. The proposed model defines a regression on study-specific covariates by introducing increased correlation for any pair of studies with similar characteristics. The desired multivariate $PT$ model is constructed by introducing a hierarchical prior on the conditional splitting probabilities in the $PT$ construction for each of the $G_i$. The hierarchical prior replaces the independent beta priors for the splitting probability in the $PT$ construction with a Gaussian process prior for corresponding (logit) splitting probabilities across all studies. The Gaussian process is indexed by study-specific covariates, introducing the desired dependence with increased correlation for similar studies. The main feature of the proposed construction is (conditionally) conjugate posterior updating with commonly reported inference summaries for event time data. The construction is motivated by a meta-analysis over cancer immunotherapy studies.
△ Less
Submitted 8 October, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Regression with Variable Dimension Covariates
Authors:
Peter Mueller,
Fernando Andrés Quintana,
Garritt L. Page
Abstract:
Regression is one of the most fundamental statistical inference problems. A broad definition of regression problems is as estimation of the distribution of an outcome using a family of probability models indexed by covariates. Despite the ubiquitous nature of regression problems and the abundance of related methods and results there is a surprising gap in the literature. There are no well establis…
▽ More
Regression is one of the most fundamental statistical inference problems. A broad definition of regression problems is as estimation of the distribution of an outcome using a family of probability models indexed by covariates. Despite the ubiquitous nature of regression problems and the abundance of related methods and results there is a surprising gap in the literature. There are no well established methods for regression with a varying dimension covariate vectors, despite the common occurrence of such problems. In this paper we review some recent related papers proposing varying dimension regression by way of random partitions.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Graph-Aligned Random Partition Model (GARP)
Authors:
Giovanni Rebaudo,
Peter Mueller
Abstract:
Bayesian nonparametric mixtures and random partition models are powerful tools for probabilistic clustering. However, standard independent mixture models can be restrictive in some applications such as inference on cell lineage due to the biological relations of the clusters. The increasing availability of large genomic data requires new statistical tools to perform model-based clustering and infe…
▽ More
Bayesian nonparametric mixtures and random partition models are powerful tools for probabilistic clustering. However, standard independent mixture models can be restrictive in some applications such as inference on cell lineage due to the biological relations of the clusters. The increasing availability of large genomic data requires new statistical tools to perform model-based clustering and infer the relationship between homogeneous subgroups of units. Motivated by single-cell RNA applications we develop a novel dependent mixture model to jointly perform cluster analysis and align the clusters on a graph. Our flexible graph-aligned random partition model (GARP) exploits Gibbs-type priors as building blocks, allowing us to derive analytical results on the graph-aligned random partition's probability mass function (pmf). We derive a generalization of the Chinese restaurant process from the pmf and a related efficient and neat MCMC algorithm to perform Bayesian inference. We perform posterior inference on real single-cell RNA data from mice stem cells. We further investigate the performance of our model in capturing the underlying clustering structure as well as the underlying graph by means of simulation studies.
△ Less
Submitted 17 May, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Predicting subgroup treatment effects for a new study: Motivations, results and learnings from running a data challenge in a pharmaceutical corporation
Authors:
Björn Bornkamp,
Silvia Zaoli,
Michela Azzarito,
Ruvie Martin,
Carsten Philipp Müller,
Conor Moloney,
Giulia Capestro,
David Ohlssen,
Mark Baillie
Abstract:
We present the motivation, experience and learnings from a data challenge conducted at a large pharmaceutical corporation on the topic of subgroup identification. The data challenge aimed at exploring approaches to subgroup identification for future clinical trials. To mimic a realistic setting, participants had access to 4 Phase III clinical trials to derive a subgroup and predict its treatment e…
▽ More
We present the motivation, experience and learnings from a data challenge conducted at a large pharmaceutical corporation on the topic of subgroup identification. The data challenge aimed at exploring approaches to subgroup identification for future clinical trials. To mimic a realistic setting, participants had access to 4 Phase III clinical trials to derive a subgroup and predict its treatment effect on a future study not accessible to challenge participants. 30 teams registered for the challenge with around 100 participants, primarily from Biostatistics organisation. We outline the motivation for running the challenge, the challenge rules and logistics. Finally, we present the results of the challenge, the participant feedback as well as the learnings, and how these learnings can be translated into statistical practice.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Learning Correlated Equilibria in Mean-Field Games
Authors:
Paul Muller,
Romuald Elie,
Mark Rowland,
Mathieu Lauriere,
Julien Perolat,
Sarah Perrin,
Matthieu Geist,
Georgios Piliouras,
Olivier Pietquin,
Karl Tuyls
Abstract:
The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-…
▽ More
The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-Field games, an approximation of anonymous $N$-player games, where the number of players is infinite and the population's state distribution, instead of every individual player's state, is the object of interest. The practical computability of Mean-Field Nash equilibria, the most studied Mean-Field equilibrium to date, however, typically depends on beneficial non-generic structural properties such as monotonicity or contraction properties, which are required for known algorithms to converge. In this work, we provide an alternative route for studying Mean-Field games, by developing the concepts of Mean-Field correlated and coarse-correlated equilibria. We show that they can be efficiently learnt in \emph{all games}, without requiring any additional assumption on the structure of the game, using three classical algorithms. Furthermore, we establish correspondences between our notions and those already present in the literature, derive optimality bounds for the Mean-Field - $N$-player transition, and empirically demonstrate the convergence of these algorithms on simple games.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning
Authors:
Mauricio Tec,
Yunshan Duan,
Peter Müller
Abstract:
Reinforcement Learning (RL) is a computational approach to reward-driven learning in sequential decision problems. It implements the discovery of optimal actions by learning from an agent interacting with an environment rather than from supervised data. We contrast and compare RL with traditional sequential design, focusing on simulation-based Bayesian sequential design (BSD). Recently, there has…
▽ More
Reinforcement Learning (RL) is a computational approach to reward-driven learning in sequential decision problems. It implements the discovery of optimal actions by learning from an agent interacting with an environment rather than from supervised data. We contrast and compare RL with traditional sequential design, focusing on simulation-based Bayesian sequential design (BSD). Recently, there has been an increasing interest in RL techniques for healthcare applications. We introduce two related applications as motivating examples. In both applications, the sequential nature of the decisions is restricted to sequential stopping. Rather than a comprehensive survey, the focus of the discussion is on solutions using standard tools for these two relatively simple sequential stopping problems. Both problems are inspired by adaptive clinical trial design. We use examples to explain the terminology and mathematical background that underlie each framework and map one to the other. The implementations and results illustrate the many similarities between RL and BSD. The results motivate the discussion of the potential strengths and limitations of each approach.
△ Less
Submitted 4 October, 2022; v1 submitted 9 May, 2022;
originally announced May 2022.
-
Scalable Deep Reinforcement Learning Algorithms for Mean Field Games
Authors:
Mathieu Laurière,
Sarah Perrin,
Sertan Girgin,
Paul Muller,
Ayush Jain,
Theophile Cabannes,
Georgios Piliouras,
Julien Pérolat,
Romuald Élie,
Olivier Pietquin,
Matthieu Geist
Abstract:
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quant…
▽ More
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. This is far from being trivial in the case of non-linear function approximation that enjoy good generalization properties, e.g. neural networks. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on regularization that does not require memorizing historical data or previous estimates. It is used to extend Online Mirror Descent. We demonstrate numerically that these methods efficiently enable the use of Deep RL algorithms to solve various MFGs. In addition, we show that these methods outperform SotA baselines from the literature.
△ Less
Submitted 17 June, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
An Explainable Stacked Ensemble Model for Static Route-Free Estimation of Time of Arrival
Authors:
Sören Schleibaum,
Jörg P. Müller,
Monika Sester
Abstract:
To compare alternative taxi schedules and to compute them, as well as to provide insights into an upcoming taxi trip to drivers and passengers, the duration of a trip or its Estimated Time of Arrival (ETA) is predicted. To reach a high prediction precision, machine learning models for ETA are state of the art. One yet unexploited option to further increase prediction precision is to combine multip…
▽ More
To compare alternative taxi schedules and to compute them, as well as to provide insights into an upcoming taxi trip to drivers and passengers, the duration of a trip or its Estimated Time of Arrival (ETA) is predicted. To reach a high prediction precision, machine learning models for ETA are state of the art. One yet unexploited option to further increase prediction precision is to combine multiple ETA models into an ensemble. While an increase of prediction precision is likely, the main drawback is that the predictions made by such an ensemble become less transparent due to the sophisticated ensemble architecture. One option to remedy this drawback is to apply eXplainable Artificial Intelligence (XAI). The contribution of this paper is three-fold. First, we combine multiple machine learning models from our previous work for ETA into a two-level ensemble model - a stacked ensemble model - which on its own is novel; therefore, we can outperform previous state-of-the-art static route-free ETA approaches. Second, we apply existing XAI methods to explain the first- and second-level models of the ensemble. Third, we propose three joining methods for combining the first-level explanations with the second-level ones. Those joining methods enable us to explain stacked ensembles for regression tasks. An experimental evaluation shows that the ETA models correctly learned the importance of those input features driving the prediction.
△ Less
Submitted 11 January, 2024; v1 submitted 17 March, 2022;
originally announced March 2022.
-
A Recommender System Based on a Double Feature Allocation Model
Authors:
Qiaohui Lin,
Peter Mueller
Abstract:
A collaborative filtering recommender system predicts user preferences by discovering common features among users and items. We implement such inference using a Bayesian double feature allocation model, that is, a model for random pairs of subsets. We use an Indian buffet process (IBP) to link users and items to features. Here a feature is a subset of users and a matching subset of items. By train…
▽ More
A collaborative filtering recommender system predicts user preferences by discovering common features among users and items. We implement such inference using a Bayesian double feature allocation model, that is, a model for random pairs of subsets. We use an Indian buffet process (IBP) to link users and items to features. Here a feature is a subset of users and a matching subset of items. By training feature-specific rating effects, we predict ratings. We use MovieLens Data to demonstrate posterior inference in the model and prediction of user preferences for unseen items compared to items they have previously rated.
Part of the implementation is a novel semi-consensus Monte Carlo method to accomodate large numbers of users and items, as is typical for related applications. The proposed approach implements parallel posterior sampling in multiple shards of users while sharing item-related global parameters across shards.
△ Less
Submitted 2 February, 2022;
originally announced February 2022.
-
Bayesian Nonparametric Common Atoms Regression for Generating Synthetic Controls in Clinical Trials
Authors:
Noirrit Kiran Chandra,
Abhra Sarkar,
John F. de Groot,
Ying Yuan,
Peter Müller
Abstract:
The availability of electronic health records (EHR) has opened opportunities to supplement increasingly expensive and difficult to carry out randomized controlled trials (RCT) with evidence from readily available real world data. In this paper, we use EHR data to construct synthetic control arms for treatment-only single arm trials. We propose a novel nonparametric Bayesian common atoms mixture mo…
▽ More
The availability of electronic health records (EHR) has opened opportunities to supplement increasingly expensive and difficult to carry out randomized controlled trials (RCT) with evidence from readily available real world data. In this paper, we use EHR data to construct synthetic control arms for treatment-only single arm trials. We propose a novel nonparametric Bayesian common atoms mixture model that allows us to find equivalent population strata in the EHR and the treatment arm and then resample the EHR data to create equivalent patient populations under both the single arm trial and the resampled EHR. Resampling is implemented via a density-free importance sampling scheme. Using the synthetic control arm, inference for the treatment effect can then be carried out using any method available for RCTs. Alternatively the proposed nonparametric Bayesian model allows straightforward model-based inference. In simulation experiments, the proposed method exhibits higher power than alternative methods in detecting treatment effects, specifically for non-linear response functions. We apply the method to supplement single arm treatment-only glioblastoma studies with a synthetic control arm based on historical trials.
△ Less
Submitted 6 May, 2023; v1 submitted 31 December, 2021;
originally announced January 2022.
-
Separate Exchangeability as Modeling Principle in Bayesian Nonparametrics
Authors:
Giovanni Rebaudo,
Qiaohui Lin,
Peter Mueller
Abstract:
We argue for the use of separate exchangeability as a modeling principle in Bayesian nonparametric (BNP) inference. Separate exchangeability is \emph{de facto} widely applied in the Bayesian parametric case, e.g., it naturally arises in simple mixed models. However, while in some areas, such as random graphs, separate and (closely related) joint exchangeability are widely used, it is curiously und…
▽ More
We argue for the use of separate exchangeability as a modeling principle in Bayesian nonparametric (BNP) inference. Separate exchangeability is \emph{de facto} widely applied in the Bayesian parametric case, e.g., it naturally arises in simple mixed models. However, while in some areas, such as random graphs, separate and (closely related) joint exchangeability are widely used, it is curiously underused for several other applications in BNP. We briefly review the definition of separate exchangeability focusing on the implications of such a definition in Bayesian modeling. We then discuss two tractable classes of models that implement separate exchangeability that are the natural counterparts of familiar partially exchangeable BNP models.
The first is nested random partitions for a data matrix, defining a partition of columns and nested partitions of rows, nested within column clusters. Many recent models for nested partitions implement partially exchangeable models related to variations of the well-known nested Dirichlet process. We argue that inference under such models in some cases ignores important features of the experimental setup. We obtain the separately exchangeable counterpart of such partially exchangeable partition structures.
The second class is about setting up separately exchangeable priors for a nonparametric regression model when multiple sets of experimental units are involved. We highlight how a Dirichlet process mixture of linear models known as ANOVA DDP can naturally implement separate exchangeability in such regression problems. Finally, we illustrate how to perform inference under such models in two real data examples.
△ Less
Submitted 20 June, 2024; v1 submitted 14 December, 2021;
originally announced December 2021.
-
A Unified Decision Framework for Phase I Dose-Finding Designs
Authors:
Yunshan Duan,
Shijie Yuan,
Yuan Ji,
Peter Mueller
Abstract:
The purpose of a phase I dose-finding clinical trial is to investigate the toxicity profiles of various doses for a new drug and identify the maximum tolerated dose. Over the past three decades, various dose-finding designs have been proposed and discussed, including conventional model-based designs, new model-based designs using toxicity probability intervals, and rule-based designs. We present a…
▽ More
The purpose of a phase I dose-finding clinical trial is to investigate the toxicity profiles of various doses for a new drug and identify the maximum tolerated dose. Over the past three decades, various dose-finding designs have been proposed and discussed, including conventional model-based designs, new model-based designs using toxicity probability intervals, and rule-based designs. We present a simple decision framework that can generate several popular designs as special cases. We show that these designs share common elements under the framework, such as the same likelihood function, the use of loss functions, and the nature of the optimal decisions as Bayes rules. They differ mostly in the choice of the prior distributions. We present theoretical results on the decision framework and its link to specific and popular designs like mTPI, BOIN, and CRM. These results provide useful insights into the designs and their underlying assumptions, and convey information to help practitioners select an appropriate design.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Bayesian Semiparametric Hidden Markov Tensor Partition Models for Longitudinal Data with Local Variable Selection
Authors:
Giorgio Paulon,
Peter Müller,
Abhra Sarkar
Abstract:
We present a flexible Bayesian semiparametric mixed model for longitudinal data analysis in the presence of potentially high-dimensional categorical covariates. Building on a novel hidden Markov tensor decomposition technique, our proposed method allows the fixed effects components to vary between dependent random partitions of the covariate space at different time points. The mechanism not only a…
▽ More
We present a flexible Bayesian semiparametric mixed model for longitudinal data analysis in the presence of potentially high-dimensional categorical covariates. Building on a novel hidden Markov tensor decomposition technique, our proposed method allows the fixed effects components to vary between dependent random partitions of the covariate space at different time points. The mechanism not only allows different sets of covariates to be included in the model at different time points but also allows the selected predictors' influences to vary flexibly over time. Smooth time-varying additive random effects are used to capture subject specific heterogeneity. We establish posterior convergence guarantees for both function estimation and variable selection. We design a Markov chain Monte Carlo algorithm for posterior computation. We evaluate the method's empirical performances through synthetic experiments and demonstrate its practical utility through real world applications.
△ Less
Submitted 4 August, 2022; v1 submitted 18 August, 2021;
originally announced August 2021.
-
Bayesian Scalable Precision Factor Analysis for Massive Sparse Gaussian Graphical Models
Authors:
Noirrit Kiran Chandra,
Peter Mueller,
Abhra Sarkar
Abstract:
We propose a novel approach to estimating the precision matrix of multivariate Gaussian data that relies on decomposing them into a low-rank and a diagonal component. Such decompositions are very popular for modeling large covariance matrices as they admit a latent factor based representation that allows easy inference. The same is however not true for precision matrices due to the lack of computa…
▽ More
We propose a novel approach to estimating the precision matrix of multivariate Gaussian data that relies on decomposing them into a low-rank and a diagonal component. Such decompositions are very popular for modeling large covariance matrices as they admit a latent factor based representation that allows easy inference. The same is however not true for precision matrices due to the lack of computationally convenient representations which restricts inference to low-to-moderate dimensional problems. We address this remarkable gap in the literature by building on a latent variable representation for such decomposition for precision matrices. The construction leads to an efficient Gibbs sampler that scales very well to high-dimensional problems far beyond the limits of the current state-of-the-art. The ability to efficiently explore the full posterior space also allows the model uncertainty to be easily assessed. The decomposition crucially additionally allows us to adapt sparsity inducing priors to shrink the insignificant entries of the precision matrix toward zero, making the approach adaptable to high-dimensional small-sample-size sparse settings. Exact zeros in the matrix encoding the underlying conditional independence graph are then determined via a novel posterior false discovery rate control procedure. A near minimax optimal posterior concentration rate for estimating precision matrices is attained by our method under mild regularity assumptions. We evaluate the method's empirical performance through synthetic experiments and illustrate its practical utility in data sets from two different application domains.
△ Less
Submitted 16 August, 2022; v1 submitted 23 July, 2021;
originally announced July 2021.
-
Search Algorithms and Loss Functions for Bayesian Clustering
Authors:
David B. Dahl,
Devin J. Johnson,
Peter Mueller
Abstract:
We propose a randomized greedy search algorithm to find a point estimate for a random partition based on a loss function and posterior Monte Carlo samples. Given the large size and awkward discrete nature of the search space, the minimization of the posterior expected loss is challenging. Our approach is a stochastic search based on a series of greedy optimizations performed in a random order and…
▽ More
We propose a randomized greedy search algorithm to find a point estimate for a random partition based on a loss function and posterior Monte Carlo samples. Given the large size and awkward discrete nature of the search space, the minimization of the posterior expected loss is challenging. Our approach is a stochastic search based on a series of greedy optimizations performed in a random order and is embarrassingly parallel. We consider several loss functions, including Binder loss and variation of information. We note that criticisms of Binder loss are the result of using equal penalties of misclassification and we show an efficient means to compute Binder loss with potentially unequal penalties. Furthermore, we extend the original variation of information to allow for unequal penalties and show no increased computational costs. We provide a reference implementation of our algorithm. Using a variety of examples, we show that our method produces clustering estimates that better minimize the expected loss and are obtained faster than existing methods.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.
-
Bayesian Nonparametric Bivariate Survival Regression for Current Status Data
Authors:
Giorgio Paulon,
Peter Müller,
Victor G. Sal Y Rosas
Abstract:
We consider nonparametric inference for event time distributions based on current status data. We show that in this scenario conventional mixture priors, including the popular Dirichlet process mixture prior, lead to biologically uninterpretable results as they unnaturally skew the probability mass for the event times toward the extremes of the observed data. Simple assumptions on dependent censor…
▽ More
We consider nonparametric inference for event time distributions based on current status data. We show that in this scenario conventional mixture priors, including the popular Dirichlet process mixture prior, lead to biologically uninterpretable results as they unnaturally skew the probability mass for the event times toward the extremes of the observed data. Simple assumptions on dependent censoring can fix the problem. We then extend the discussion to bivariate current status data with partial ordering of the two outcomes. In addition to dependent censoring, we also exploit some minimal known structure relating the two event times. We design a Markov chain Monte Carlo algorithm for posterior simulation. Applied to a recurrent infection study, the method provides novel insights into how symptoms-related hospital visits are affected by covariates.
△ Less
Submitted 22 September, 2020; v1 submitted 14 September, 2020;
originally announced September 2020.
-
The Dependent Dirichlet Process and Related Models
Authors:
Fernand A. Quintana,
Peter Mueller,
Alejandro Jara,
Steven N. MacEachern
Abstract:
Standard regression approaches assume that some finite number of the response distribution characteristics, such as location and scale, change as a (parametric or nonparametric) function of predictors. However, it is not always appropriate to assume a location/scale representation, where the error distribution has unchanging shape over the predictor space. In fact, it often happens in applied rese…
▽ More
Standard regression approaches assume that some finite number of the response distribution characteristics, such as location and scale, change as a (parametric or nonparametric) function of predictors. However, it is not always appropriate to assume a location/scale representation, where the error distribution has unchanging shape over the predictor space. In fact, it often happens in applied research that the distribution of responses under study changes with predictors in ways that cannot be reasonably represented by a finite dimensional functional form. This can seriously affect the answers to the scientific questions of interest, and therefore more general approaches are indeed needed. This gives rise to the study of fully nonparametric regression models. We review some of the main Bayesian approaches that have been employed to define probability models where the complete response distribution may vary flexibly with predictors. We focus on developments based on modifications of the Dirichlet process, historically termed dependent Dirichlet processes, and some of the extensions that have been proposed to tackle this general problem using nonparametric approaches.
△ Less
Submitted 12 July, 2020;
originally announced July 2020.
-
Clustering and Prediction with Variable Dimension Covariates
Authors:
Garritt L. Page,
Fernando A. Quintana,
Peter Müller
Abstract:
In many applied fields incomplete covariate vectors are commonly encountered. It is well known that this can be problematic when making inference on model parameters, but its impact on prediction performance is less understood. We develop a method based on covariate dependent partition models that seamlessly handles missing covariates while completely avoiding any type of imputation. The method we…
▽ More
In many applied fields incomplete covariate vectors are commonly encountered. It is well known that this can be problematic when making inference on model parameters, but its impact on prediction performance is less understood. We develop a method based on covariate dependent partition models that seamlessly handles missing covariates while completely avoiding any type of imputation. The method we develop allows in-sample predictions as well as out-of-sample prediction, even if the missing pattern in the new subjects' incomplete covariate vector was not seen in the training data. Any data type, including categorical or continuous covariates are permitted. In simulation studies the proposed method compares favorably. We illustrate the method in two application examples.
△ Less
Submitted 12 July, 2020; v1 submitted 30 December, 2019;
originally announced December 2019.
-
A Semi-parametric Bayesian Approach to Population Finding with Time-to-Event and Toxicity Data in a Randomized Clinical Trial
Authors:
Satoshi Morita,
Peter Müller,
Hiroyasu Abe
Abstract:
A utility-based Bayesian population finding (BaPoFi) method was proposed by Morita and Müller (2017, Biometrics, 1355-1365) to analyze data from a randomized clinical trial with the aim of identifying good predictive baseline covariates for optimizing the target population for a future study. The approach casts the population finding process as a formal decision problem together with a flexible pr…
▽ More
A utility-based Bayesian population finding (BaPoFi) method was proposed by Morita and Müller (2017, Biometrics, 1355-1365) to analyze data from a randomized clinical trial with the aim of identifying good predictive baseline covariates for optimizing the target population for a future study. The approach casts the population finding process as a formal decision problem together with a flexible probability model using a random forest to define a regression mean function. BaPoFi is constructed to handle a single continuous or binary outcome variable. In this paper, we develop BaPoFi-TTE as an extension of the earlier approach for clinically important cases of time-to-event (TTE) data with censoring, and also accounting for a toxicity outcome. We model the association of TTE data with baseline covariates using a semi-parametric failure time model with a Pólya tree prior for an unknown error term and a random forest for a flexible regression mean function. We define a utility function that addresses a trade-off between efficacy and toxicity as one of the important clinical considerations for population finding. We examine the operating characteristics of the proposed method in extensive simulation studies. For illustration, we apply the proposed method to data from a randomized oncology clinical trial. Concerns in a preliminary analysis of the same data based on a parametric model motivated the proposed more general approach.
△ Less
Submitted 26 October, 2019;
originally announced October 2019.
-
InceptionTime: Finding AlexNet for Time Series Classification
Authors:
Hassan Ismail Fawaz,
Benjamin Lucas,
Germain Forestier,
Charlotte Pelletier,
Daniel F. Schmidt,
Jonathan Weber,
Geoffrey I. Webb,
Lhassane Idoumghar,
Pierre-Alain Muller,
François Petitjean
Abstract:
This paper brings deep learning at the forefront of research into Time Series Classification (TSC). TSC is the area of machine learning tasked with the categorization (or labelling) of time series. The last few decades of work in this area have led to significant progress in the accuracy of classifiers, with the state of the art now represented by the HIVE-COTE algorithm. While extremely accurate,…
▽ More
This paper brings deep learning at the forefront of research into Time Series Classification (TSC). TSC is the area of machine learning tasked with the categorization (or labelling) of time series. The last few decades of work in this area have led to significant progress in the accuracy of classifiers, with the state of the art now represented by the HIVE-COTE algorithm. While extremely accurate, HIVE-COTE cannot be applied to many real-world datasets because of its high training time complexity in O(N2 * T4) for a dataset with N time series of length T. For example, it takes HIVE-COTE more than 8 days to learn from a small dataset with N = 1500 time series of short length T = 46. Meanwhile deep learning has received enormous attention because of its high accuracy and scalability. Recent approaches to deep learning for TSC have been scalable, but less accurate than HIVE-COTE. We introduce InceptionTime - an ensemble of deep Convolutional Neural Network (CNN) models, inspired by the Inception-v4 architecture. Our experiments show that InceptionTime is on par with HIVE-COTE in terms of accuracy while being much more scalable: not only can it learn from 1,500 time series in one hour but it can also learn from 8M time series in 13 hours, a quantity of data that is fully out of reach of HIVE-COTE.
△ Less
Submitted 5 December, 2020; v1 submitted 11 September, 2019;
originally announced September 2019.
-
Accurate and interpretable evaluation of surgical skills from kinematic data using fully convolutional neural networks
Authors:
Hassan Ismail Fawaz,
Germain Forestier,
Jonathan Weber,
Lhassane Idoumghar,
Pierre-Alain Muller
Abstract:
Purpose: Manual feedback from senior surgeons observing less experienced trainees is a laborious task that is very expensive, time-consuming and prone to subjectivity. With the number of surgical procedures increasing annually, there is an unprecedented need to provide an accurate, objective and automatic evaluation of trainees' surgical skills in order to improve surgical practice. Methods: In th…
▽ More
Purpose: Manual feedback from senior surgeons observing less experienced trainees is a laborious task that is very expensive, time-consuming and prone to subjectivity. With the number of surgical procedures increasing annually, there is an unprecedented need to provide an accurate, objective and automatic evaluation of trainees' surgical skills in order to improve surgical practice. Methods: In this paper, we designed a convolutional neural network (CNN) to classify surgical skills by extracting latent patterns in the trainees' motions performed during robotic surgery. The method is validated on the JIGSAWS dataset for two surgical skills evaluation tasks: classification and regression. Results: Our results show that deep neural networks constitute robust machine learning models that are able to reach new competitive state-of-the-art performance on the JIGSAWS dataset. While we leveraged from CNNs' efficiency, we were able to minimize its black-box effect using the class activation map technique. Conclusions: This characteristic allowed our method to automatically pinpoint which parts of the surgery influenced the skill evaluation the most, thus allowing us to explain a surgical skill classification and provide surgeons with a novel personalized feedback technique. We believe this type of interpretable machine learning model could integrate within "Operation Room 2.0" and support novice surgeons in improving their skills to eventually become experts.
△ Less
Submitted 20 August, 2019;
originally announced August 2019.
-
BOAH: A Tool Suite for Multi-Fidelity Bayesian Optimization & Analysis of Hyperparameters
Authors:
Marius Lindauer,
Katharina Eggensperger,
Matthias Feurer,
André Biedenkapp,
Joshua Marben,
Philipp Müller,
Frank Hutter
Abstract:
Hyperparameter optimization and neural architecture search can become prohibitively expensive for regular black-box Bayesian optimization because the training and evaluation of a single model can easily take several hours. To overcome this, we introduce a comprehensive tool suite for effective multi-fidelity Bayesian optimization and the analysis of its runs. The suite, written in Python, provides…
▽ More
Hyperparameter optimization and neural architecture search can become prohibitively expensive for regular black-box Bayesian optimization because the training and evaluation of a single model can easily take several hours. To overcome this, we introduce a comprehensive tool suite for effective multi-fidelity Bayesian optimization and the analysis of its runs. The suite, written in Python, provides a simple way to specify complex design spaces, a robust and efficient combination of Bayesian optimization and HyperBand, and a comprehensive analysis of the optimization process and its outcomes.
△ Less
Submitted 16 August, 2019;
originally announced August 2019.
-
Consensus Monte Carlo for Random Subsets using Shared Anchors
Authors:
Yang Ni,
Yuan Ji,
Peter Mueller
Abstract:
We present a consensus Monte Carlo algorithm that scales existing Bayesian nonparametric models for clustering and feature allocation to big data. The algorithm is valid for any prior on random subsets such as partitions and latent feature allocation, under essentially any sampling model. Motivated by three case studies, we focus on clustering induced by a Dirichlet process mixture sampling model,…
▽ More
We present a consensus Monte Carlo algorithm that scales existing Bayesian nonparametric models for clustering and feature allocation to big data. The algorithm is valid for any prior on random subsets such as partitions and latent feature allocation, under essentially any sampling model. Motivated by three case studies, we focus on clustering induced by a Dirichlet process mixture sampling model, inference under an Indian buffet process prior with a binomial sampling model, and with a categorical sampling model. We assess the proposed algorithm with simulation studies and show results for inference with three datasets: an MNIST image dataset, a dataset of pancreatic cancer mutations, and a large set of electronic health records (EHR). Supplementary materials for this article are available online.
△ Less
Submitted 25 February, 2020; v1 submitted 28 June, 2019;
originally announced June 2019.
-
Automatic alignment of surgical videos using kinematic data
Authors:
Hassan Ismail Fawaz,
Germain Forestier,
Jonathan Weber,
François Petitjean,
Lhassane Idoumghar,
Pierre-Alain Muller
Abstract:
Over the past one hundred years, the classic teaching methodology of "see one, do one, teach one" has governed the surgical education systems worldwide. With the advent of Operation Room 2.0, recording video, kinematic and many other types of data during the surgery became an easy task, thus allowing artificial intelligence systems to be deployed and used in surgical and medical practice. Recently…
▽ More
Over the past one hundred years, the classic teaching methodology of "see one, do one, teach one" has governed the surgical education systems worldwide. With the advent of Operation Room 2.0, recording video, kinematic and many other types of data during the surgery became an easy task, thus allowing artificial intelligence systems to be deployed and used in surgical and medical practice. Recently, surgical videos has been shown to provide a structure for peer coaching enabling novice trainees to learn from experienced surgeons by replaying those videos. However, the high inter-operator variability in surgical gesture duration and execution renders learning from comparing novice to expert surgical videos a very difficult task. In this paper, we propose a novel technique to align multiple videos based on the alignment of their corresponding kinematic multivariate time series data. By leveraging the Dynamic Time Warping measure, our algorithm synchronizes a set of videos in order to show the same gesture being performed at different speed. We believe that the proposed approach is a valuable addition to the existing learning tools for surgery.
△ Less
Submitted 26 April, 2019; v1 submitted 3 April, 2019;
originally announced April 2019.
-
A Bayesian Nonparametric Approach for Evaluating the Causal Effect of Treatment in Randomized Trials with Semi-Competing Risks
Authors:
Yanxun Xu,
Daniel Scharfstein,
Peter Müller,
Michael Daniels
Abstract:
We develop a Bayesian nonparametric (BNP) approach to evaluate the causal effect of treatment in a randomized trial where a nonterminal event may be censored by a terminal event, but not vice versa (i.e., semi-competing risks). Based on the idea of principal stratification, we define a novel estimand for the causal effect of treatment on the nonterminal event. We introduce identification assumptio…
▽ More
We develop a Bayesian nonparametric (BNP) approach to evaluate the causal effect of treatment in a randomized trial where a nonterminal event may be censored by a terminal event, but not vice versa (i.e., semi-competing risks). Based on the idea of principal stratification, we define a novel estimand for the causal effect of treatment on the nonterminal event. We introduce identification assumptions, indexed by a sensitivity parameter, and show how to draw inference using our BNP approach. We conduct simulation studies and illustrate our methodology using data from a brain cancer trial.
△ Less
Submitted 21 July, 2019; v1 submitted 20 March, 2019;
originally announced March 2019.
-
Adversarial Attacks on Deep Neural Networks for Time Series Classification
Authors:
Hassan Ismail Fawaz,
Germain Forestier,
Jonathan Weber,
Lhassane Idoumghar,
Pierre-Alain Muller
Abstract:
Time Series Classification (TSC) problems are encountered in many real life data mining tasks ranging from medicine and security to human activity recognition and food safety. With the recent success of deep neural networks in various domains such as computer vision and natural language processing, researchers started adopting these techniques for solving time series data mining problems. However,…
▽ More
Time Series Classification (TSC) problems are encountered in many real life data mining tasks ranging from medicine and security to human activity recognition and food safety. With the recent success of deep neural networks in various domains such as computer vision and natural language processing, researchers started adopting these techniques for solving time series data mining problems. However, to the best of our knowledge, no previous work has considered the vulnerability of deep learning models to adversarial time series examples, which could potentially make them unreliable in situations where the decision taken by the classifier is crucial such as in medicine and security. For computer vision problems, such attacks have been shown to be very easy to perform by altering the image and adding an imperceptible amount of noise to trick the network into wrongly classifying the input image. Following this line of work, we propose to leverage existing adversarial attack mechanisms to add a special noise to the input time series in order to decrease the network's confidence when classifying instances at test time. Our results reveal that current state-of-the-art deep learning time series classifiers are vulnerable to adversarial attacks which can have major consequences in multiple domains such as food safety and quality assurance.
△ Less
Submitted 26 April, 2019; v1 submitted 17 March, 2019;
originally announced March 2019.
-
Deep Neural Network Ensembles for Time Series Classification
Authors:
Hassan Ismail Fawaz,
Germain Forestier,
Jonathan Weber,
Lhassane Idoumghar,
Pierre-Alain Muller
Abstract:
Deep neural networks have revolutionized many fields such as computer vision and natural language processing. Inspired by this recent success, deep learning started to show promising results for Time Series Classification (TSC). However, neural networks are still behind the state-of-the-art TSC algorithms, that are currently composed of ensembles of 37 non deep learning based classifiers. We attri…
▽ More
Deep neural networks have revolutionized many fields such as computer vision and natural language processing. Inspired by this recent success, deep learning started to show promising results for Time Series Classification (TSC). However, neural networks are still behind the state-of-the-art TSC algorithms, that are currently composed of ensembles of 37 non deep learning based classifiers. We attribute this gap in performance due to the lack of neural network ensembles for TSC. Therefore in this paper, we show how an ensemble of 60 deep learning models can significantly improve upon the current state-of-the-art performance of neural networks for TSC, when evaluated over the UCR/UEA archive: the largest publicly available benchmark for time series analysis. Finally, we show how our proposed Neural Network Ensemble (NNE) is the first time series classifier to outperform COTE while reaching similar performance to the current state-of-the-art ensemble HIVE-COTE.
△ Less
Submitted 26 April, 2019; v1 submitted 15 March, 2019;
originally announced March 2019.
-
Transfer learning for time series classification
Authors:
Hassan Ismail Fawaz,
Germain Forestier,
Jonathan Weber,
Lhassane Idoumghar,
Pierre-Alain Muller
Abstract:
Transfer learning for deep neural networks is the process of first training a base network on a source dataset, and then transferring the learned features (the network's weights) to a second network to be trained on a target dataset. This idea has been shown to improve deep neural network's generalization capabilities in many computer vision tasks such as image recognition and object localization.…
▽ More
Transfer learning for deep neural networks is the process of first training a base network on a source dataset, and then transferring the learned features (the network's weights) to a second network to be trained on a target dataset. This idea has been shown to improve deep neural network's generalization capabilities in many computer vision tasks such as image recognition and object localization. Apart from these applications, deep Convolutional Neural Networks (CNNs) have also recently gained popularity in the Time Series Classification (TSC) community. However, unlike for image recognition problems, transfer learning techniques have not yet been investigated thoroughly for the TSC task. This is surprising as the accuracy of deep learning models for TSC could potentially be improved if the model is fine-tuned from a pre-trained neural network instead of training it from scratch. In this paper, we fill this gap by investigating how to transfer deep CNNs for the TSC task. To evaluate the potential of transfer learning, we performed extensive experiments using the UCR archive which is the largest publicly available TSC benchmark containing 85 datasets. For each dataset in the archive, we pre-trained a model and then fine-tuned it on the other datasets resulting in 7140 different deep neural networks. These experiments revealed that transfer learning can improve or degrade the model's predictions depending on the dataset used for transfer. Therefore, in an effort to predict the best source dataset for a given target dataset, we propose a new method relying on Dynamic Time Warping to measure inter-datasets similarities. We describe how our method can guide the transfer to choose the best source dataset leading to an improvement in accuracy on 71 out of 85 datasets.
△ Less
Submitted 5 November, 2018;
originally announced November 2018.
-
Bayesian Double Feature Allocation for Phenotyping with Electronic Health Records
Authors:
Yang Ni,
Peter Mueller,
Yuan Ji
Abstract:
We propose a categorical matrix factorization method to infer latent diseases from electronic health records (EHR) data in an unsupervised manner. A latent disease is defined as an unknown biological aberration that causes a set of common symptoms for a group of patients. The proposed approach is based on a novel double feature allocation model which simultaneously allocates features to the rows a…
▽ More
We propose a categorical matrix factorization method to infer latent diseases from electronic health records (EHR) data in an unsupervised manner. A latent disease is defined as an unknown biological aberration that causes a set of common symptoms for a group of patients. The proposed approach is based on a novel double feature allocation model which simultaneously allocates features to the rows and the columns of a categorical matrix. Using a Bayesian approach, available prior information on known diseases greatly improves identifiability and interpretability of latent diseases. This includes known diagnoses for patients and known association of diseases with symptoms. We validate the proposed approach by simulation studies including mis-specified models and comparison with sparse latent factor models. In the application to Chinese EHR data, we find interesting results, some of which agree with related clinical and medical knowledge.
△ Less
Submitted 13 February, 2019; v1 submitted 4 September, 2018;
originally announced September 2018.
-
Deep learning for time series classification: a review
Authors:
Hassan Ismail Fawaz,
Germain Forestier,
Jonathan Weber,
Lhassane Idoumghar,
Pierre-Alain Muller
Abstract:
Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revo…
▽ More
Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state-of-the-art performance for document classification and speech recognition. In this article, we study the current state-of-the-art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.
△ Less
Submitted 14 May, 2019; v1 submitted 12 September, 2018;
originally announced September 2018.
-
Scalable Bayesian Nonparametric Clustering and Classification
Authors:
Yang Ni,
Peter Müller,
Maurice Diesendruck,
Sinead Williamson,
Yitan Zhu,
Yuan Ji
Abstract:
We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is "embarrassingly parallel" and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable t…
▽ More
We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is "embarrassingly parallel" and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets. Specifically, we apply the approach to inference under a product partition model with regression on covariates. We show results for inference with two motivating data sets: a large set of electronic health records (EHR) and a bank telemarketing dataset. We find interesting clusters and favorable classification performance relative to other widely used competing classifiers.
△ Less
Submitted 7 June, 2018;
originally announced June 2018.
-
Discussions of the paper "Sparse graphs using exchangeable random measures" by F. Caron and E. B. Fox
Authors:
Julyan Arbel,
Marco Battiston,
Stefano Favaro,
Antonio Lijoi,
Igor Prünster,
Ramsés H. Mena,
Yang Ni,
Peter Müller
Abstract:
These are written discussions of the paper "Sparse graphs using exchangeable random measures" by François Caron and Emily B. Fox, contributed to the Journal of the Royal Statistical Society Series B.
These are written discussions of the paper "Sparse graphs using exchangeable random measures" by François Caron and Emily B. Fox, contributed to the Journal of the Royal Statistical Society Series B.
△ Less
Submitted 4 July, 2017;
originally announced July 2017.
-
Pattern representation and recognition with accelerated analog neuromorphic systems
Authors:
Mihai A. Petrovici,
Sebastian Schmitt,
Johann Klähn,
David Stöckel,
Anna Schroeder,
Guillaume Bellec,
Johannes Bill,
Oliver Breitwieser,
Ilja Bytschok,
Andreas Grübl,
Maurice Güttler,
Andreas Hartel,
Stephan Hartmann,
Dan Husmann,
Kai Husmann,
Sebastian Jeltsch,
Vitali Karasenko,
Mitja Kleider,
Christoph Koke,
Alexander Kononov,
Christian Mauch,
Eric Müller,
Paul Müller,
Johannes Partzsch,
Thomas Pfeil
, et al. (11 additional authors not shown)
Abstract:
Despite being originally inspired by the central nervous system, artificial neural networks have diverged from their biological archetypes as they have been remodeled to fit particular tasks. In this paper, we review several possibilites to reverse map these architectures to biologically more realistic spiking networks with the aim of emulating them on fast, low-power neuromorphic hardware. Since…
▽ More
Despite being originally inspired by the central nervous system, artificial neural networks have diverged from their biological archetypes as they have been remodeled to fit particular tasks. In this paper, we review several possibilites to reverse map these architectures to biologically more realistic spiking networks with the aim of emulating them on fast, low-power neuromorphic hardware. Since many of these devices employ analog components, which cannot be perfectly controlled, finding ways to compensate for the resulting effects represents a key challenge. Here, we discuss three different strategies to address this problem: the addition of auxiliary network components for stabilizing activity, the utilization of inherently robust architectures and a training method for hardware-emulated networks that functions without perfect knowledge of the system's dynamics and parameters. For all three scenarios, we corroborate our theoretical considerations with experimental results on accelerated analog neuromorphic platforms.
△ Less
Submitted 3 July, 2017; v1 submitted 17 March, 2017;
originally announced March 2017.
-
TreeClone: Reconstruction of Tumor Subclone Phylogeny Based on Mutation Pairs using Next Generation Sequencing Data
Authors:
Tianjian Zhou,
Subhajit Sengupta,
Peter Mueller,
Yuan Ji
Abstract:
We present TreeClone, a latent feature allocation model to reconstruct tumor subclones subject to phylogenetic evolution that mimics tumor evolution. Similar to most current methods, we consider data from next-generation sequencing of tumor DNA. Unlike most methods that use information in short reads mapped to single nucleotide variants (SNVs), we consider subclone phylogeny reconstruction using p…
▽ More
We present TreeClone, a latent feature allocation model to reconstruct tumor subclones subject to phylogenetic evolution that mimics tumor evolution. Similar to most current methods, we consider data from next-generation sequencing of tumor DNA. Unlike most methods that use information in short reads mapped to single nucleotide variants (SNVs), we consider subclone phylogeny reconstruction using pairs of two proximal SNVs that can be mapped by the same short reads. As part of the Bayesian inference model, we construct a phylogenetic tree prior. The use of the tree structure in the prior greatly strengthens inference. Only subclones that can be explained by a phylogenetic tree are assigned non-negligible probabilities. The proposed Bayesian framework implies posterior distributions on the number of subclones, their genotypes, cellular proportions, and the phylogenetic tree spanned by the inferred subclones. The proposed method is validated against different sets of simulated and real-world data using single and multiple tumor samples. An open source software package is available at http://www.compgenome.org/treeclone.
△ Less
Submitted 25 October, 2017; v1 submitted 10 March, 2017;
originally announced March 2017.
-
PairClone: A Bayesian Subclone Caller Based on Mutation Pairs
Authors:
Tianjian Zhou,
Peter Mueller,
Subhajit Sengupta,
Yuan Ji
Abstract:
Tumor cell populations can be thought of as being composed of homogeneous cell subpopulations, with each subpopulation being characterized by overlapping sets of single nucleotide variants (SNVs). Such subpopulations are known as subclones and are an important target for precision medicine. Reconstructing such subclones from next-generation sequencing (NGS) data is one of the major challenges in p…
▽ More
Tumor cell populations can be thought of as being composed of homogeneous cell subpopulations, with each subpopulation being characterized by overlapping sets of single nucleotide variants (SNVs). Such subpopulations are known as subclones and are an important target for precision medicine. Reconstructing such subclones from next-generation sequencing (NGS) data is one of the major challenges in precision medicine. We present PairClone as a new tool to implement this reconstruction. The main idea of PairClone is to model short reads mapped to pairs of proximal SNVs. In contrast, most existing methods use only marginal reads for unpaired SNVs. Using Bayesian nonparametric models, we estimate posterior probabilities of the number, genotypes and population frequencies of subclones in one or more tumor sample. We use the categorical Indian buffet process (cIBP) as a prior probability model for subclones that are represented as vectors of categorical matrices that record the corresponding sets of mutation pairs. Performance of PairClone is assessed using simulated and real datasets. An open source software package can be obtained at http://www.compgenome.org/pairclone.
△ Less
Submitted 24 February, 2017;
originally announced February 2017.
-
Heterogeneous Reciprocal Graphical Models
Authors:
Yang Ni,
Peter Mueller,
Yitan Zhu,
Yuan Ji
Abstract:
We develop novel hierarchical reciprocal graphical models to infer gene networks from heterogeneous data. In the case of data that can be naturally divided into known groups, we propose to connect graphs by introducing a hierarchical prior across group-specific graphs, including a correlation on edge strengths across graphs. Thresholding priors are applied to induce sparsity of the estimated netwo…
▽ More
We develop novel hierarchical reciprocal graphical models to infer gene networks from heterogeneous data. In the case of data that can be naturally divided into known groups, we propose to connect graphs by introducing a hierarchical prior across group-specific graphs, including a correlation on edge strengths across graphs. Thresholding priors are applied to induce sparsity of the estimated networks. In the case of unknown groups, we cluster subjects into subpopulations and jointly estimate cluster-specific gene networks, again using similar hierarchical priors across clusters. We illustrate the proposed approach by simulation studies and two applications with multiplatform genomic data for multiple cancers.
△ Less
Submitted 21 January, 2018; v1 submitted 18 December, 2016;
originally announced December 2016.
-
A Nonparametric Bayesian Basket Trial Design
Authors:
Yanxun Xu,
Peter Mueller,
Apostolia M Tsimberidou,
Donald Berry
Abstract:
Targeted therapies on the basis of genomic aberrations analysis of the tumor have shown promising results in cancer prognosis and treatment. Regardless of tumor type, trials that match patients to targeted therapies for their particular genomic aberrations have become a mainstream direction of therapeutic management of patients with cancer. Therefore, finding the subpopulation of patients who can…
▽ More
Targeted therapies on the basis of genomic aberrations analysis of the tumor have shown promising results in cancer prognosis and treatment. Regardless of tumor type, trials that match patients to targeted therapies for their particular genomic aberrations have become a mainstream direction of therapeutic management of patients with cancer. Therefore, finding the subpopulation of patients who can most benefit from an aberration-specific targeted therapy across multiple cancer types is important. We propose an adaptive Bayesian clinical trial design for patient allocation and subpopulation identification. We start with a decision theoretic approach, including a utility function and a probability model across all possible subpopulation models. The main features of the proposed design and population finding methods are that we allow for variable sets of covariates to be recorded by different patients, adjust for missing data, allow high order interactions of covariates, and the adaptive allocation of each patient to treatment arms using the posterior predictive probability of which arm is best for each patient. The new method is demonstrated via extensive simulation studies.
△ Less
Submitted 17 April, 2018; v1 submitted 8 December, 2016;
originally announced December 2016.
-
Reciprocal Graphical Models for Integrative Gene Regulatory Network Analysis
Authors:
Yang Ni,
Yuan Ji,
Peter Mueller
Abstract:
Constructing gene regulatory networks is a fundamental task in systems biology. We introduce a Gaussian reciprocal graphical model for inference about gene regulatory relationships by integrating mRNA gene expression and DNA level information including copy number and methylation. Data integration allows for inference on the directionality of certain regulatory relationships, which would be otherw…
▽ More
Constructing gene regulatory networks is a fundamental task in systems biology. We introduce a Gaussian reciprocal graphical model for inference about gene regulatory relationships by integrating mRNA gene expression and DNA level information including copy number and methylation. Data integration allows for inference on the directionality of certain regulatory relationships, which would be otherwise indistinguishable due to Markov equivalence. Efficient inference is developed based on simultaneous equation models. Bayesian model selection techniques are adopted to estimate the graph structure. We illustrate our approach by simulations and two applications in ZODIAC pairwise gene interaction analysis and colon adenocarcinoma pathway analysis.
△ Less
Submitted 22 July, 2016;
originally announced July 2016.
-
A Bayesian feature allocation model for tumor heterogeneity
Authors:
Juhee Lee,
Peter Müller,
Kamalakar Gulukota,
Yuan Ji
Abstract:
We develop a feature allocation model for inference on genetic tumor variation using next-generation sequencing data. Specifically, we record single nucleotide variants (SNVs) based on short reads mapped to human reference genome and characterize tumor heterogeneity by latent haplotypes defined as a scaffold of SNVs on the same homologous genome. For multiple samples from a single tumor, assuming…
▽ More
We develop a feature allocation model for inference on genetic tumor variation using next-generation sequencing data. Specifically, we record single nucleotide variants (SNVs) based on short reads mapped to human reference genome and characterize tumor heterogeneity by latent haplotypes defined as a scaffold of SNVs on the same homologous genome. For multiple samples from a single tumor, assuming that each sample is composed of some sample-specific proportions of these haplotypes, we then fit the observed variant allele fractions of SNVs for each sample and estimate the proportions of haplotypes. Varying proportions of haplotypes across samples is evidence of tumor heterogeneity since it implies varying composition of cell subpopulations. Taking a Bayesian perspective, we proceed with a prior probability model for all relevant unknown quantities, including, in particular, a prior probability model on the binary indicators that characterize the latent haplotypes. Such prior models are known as feature allocation models. Specifically, we define a simplified version of the Indian buffet process, one of the most traditional feature allocation models. The proposed model allows overlapping clustering of SNVs in defining latent haplotypes, which reflects the evolutionary process of subclonal expansion in tumor samples.
△ Less
Submitted 14 September, 2015;
originally announced September 2015.
-
Bayesian Inference for Latent Biologic Structure with Determinantal Point Processes (DPP)
Authors:
Yanxun Xu,
Peter Mueller,
Donatello Telesca
Abstract:
We discuss the use of the determinantal point process (DPP) as a prior for latent structure in biomedical applications, where inference often centers on the interpretation of latent features as biologically or clinically meaningful structure. Typical examples include mixture models, when the terms of the mixture are meant to represent clinically meaningful subpopulations (of patients, genes, etc.)…
▽ More
We discuss the use of the determinantal point process (DPP) as a prior for latent structure in biomedical applications, where inference often centers on the interpretation of latent features as biologically or clinically meaningful structure. Typical examples include mixture models, when the terms of the mixture are meant to represent clinically meaningful subpopulations (of patients, genes, etc.). Another class of examples are feature allocation models. We propose the DPP prior as a repulsive prior on latent mixture components in the first example, and as prior on feature-specific parameters in the second case. We argue that the DPP is in general an attractive prior model for latent structure when biologically relevant interpretation of such structure is desired. We illustrate the advantages of DPP prior in three case studies, including inference in mixture models for magnetic resonance images (MRI) and for protein expression, and a feature allocation model for gene expression using data from The Cancer Genome Atlas. An important part of our argument are efficient and straightforward posterior simulation methods. We implement a variation of reversible jump Markov chain Monte Carlo simulation for inference under the DPP prior, using a density with respect to the unit rate Poisson process.
△ Less
Submitted 16 November, 2015; v1 submitted 26 June, 2015;
originally announced June 2015.
-
A Decision-Theoretic Comparison of Treatments to Resolve Air Leaks After Lung Surgery Based on Nonparametric Modeling
Authors:
Yanxun Xu,
Peter F. Thall,
Peter Mueller,
Mehran J. Reza
Abstract:
We propose a Bayesian nonparametric utility-based group sequential design for a randomized clinical trial to compare a gel sealant to standard care for resolving air leaks after pulmonary resection. Clinically, resolving air leaks in the days soon after surgery is highly important, since longer resolution time produces undesirable complications that require extended hospitalization. The problem of…
▽ More
We propose a Bayesian nonparametric utility-based group sequential design for a randomized clinical trial to compare a gel sealant to standard care for resolving air leaks after pulmonary resection. Clinically, resolving air leaks in the days soon after surgery is highly important, since longer resolution time produces undesirable complications that require extended hospitalization. The problem of comparing treatments is complicated by the fact that the resolution time distributions are skewed and multi-modal, so using means is misleading. We address these challenges by assuming Bayesian nonparametric probability models for the resolution time distributions and basing the comparative test on weighted means. The weights are elicited as clinical utilities of the resolution times. The proposed design uses posterior expected utilities as group sequential test criteria. The procedure's frequentist properties are studied by extensive simulations.
△ Less
Submitted 18 July, 2016; v1 submitted 25 June, 2015;
originally announced June 2015.
-
Bayesian Inference for Tumor Subclones Accounting for Sequencing and Structural Variants
Authors:
Juhee Lee,
Peter Mueller,
Subhajit Sengupta,
Kamalakar Gulukota,
Yuan Ji
Abstract:
Tumor samples are heterogeneous. They consist of different subclones that are characterized by differences in DNA nucleotide sequences and copy numbers on multiple loci. Heterogeneity can be measured through the identification of the subclonal copy number and sequence at a selected set of loci. Understanding that the accurate identification of variant allele fractions greatly depends on a precise…
▽ More
Tumor samples are heterogeneous. They consist of different subclones that are characterized by differences in DNA nucleotide sequences and copy numbers on multiple loci. Heterogeneity can be measured through the identification of the subclonal copy number and sequence at a selected set of loci. Understanding that the accurate identification of variant allele fractions greatly depends on a precise determination of copy numbers, we develop a Bayesian feature allocation model for jointly calling subclonal copy numbers and the corresponding allele sequences for the same loci. The proposed method utilizes three random matrices, L, Z and w to represent subclonal copy numbers (L), numbers of subclonal variant alleles (Z) and cellular fractions of subclones in samples (w), respectively. The unknown number of subclones implies a random number of columns for these matrices. We use next-generation sequencing data to estimate the subclonal structures through inference on these three matrices. Using simulation studies and a real data analysis, we demonstrate how posterior inference on the subclonal structure is enhanced with the joint modeling of both structure and sequencing variants on subclonal genomes. Software is available at http://compgenome.org/BayClone2.
△ Less
Submitted 25 September, 2014;
originally announced September 2014.