Skip to main content

Showing 1–50 of 59 results for author: Mueller, P

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.13202  [pdf, other

    stat.AP stat.ME

    SIMBA -- A Bayesian Decision Framework for the Identification of Optimal Biomarker Subgroups for Cancer Basket Clinical Trials

    Authors: Shijie Yuan, Jiaxin Liu, Zhihua Gong, Xia Qin, Crystal Qin, Yuan Ji, Peter Müller

    Abstract: We consider basket trials in which a biomarker-targeting drug may be efficacious for patients across different disease indications. Patients are enrolled if their cells exhibit some levels of biomarker expression. The threshold level is allowed to vary by indication. The proposed SIMBA method uses a decision framework to identify optimal biomarker subgroups (OBS) defined by an optimal biomarker th… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 34 pages, 12 figures

  2. arXiv:2505.06491  [pdf, ps, other

    stat.ME stat.AP

    Borrowing strength between unaligned binary time-series via Bayesian nonparametric rescaling of Unified Skewed Normal priors

    Authors: Beatrice Cantoni, Giovanni Poli, Elizabeth Juarez-Colunga, Peter Müller

    Abstract: We define a Bayesian semi-parametric model to effectively conduct inference with unaligned longitudinal binary data. The proposed strategy is motivated by data from the Human Epilepsy Project (HEP), which collects seizure occurrence data for epilepsy patients, together with relevant covariates. The model is designed to flexibly accommodate the particular challenges that arise with such data. First… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  3. arXiv:2504.12617  [pdf, other

    stat.ME stat.AP stat.CO stat.ML

    Bayesian Density-Density Regression with Application to Cell-Cell Communications

    Authors: Khai Nguyen, Yang Ni, Peter Mueller

    Abstract: We introduce a scalable framework for regressing multivariate distributions onto multivariate distributions, motivated by the application of inferring cell-cell communication from population-scale single-cell data. The observed data consist of pairs of multivariate distributions for ligands from one cell type and corresponding receptors from another. For each ordered pair $e=(l,r)$ of cell types… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 42 pages, 24 figures, 1 table

  4. arXiv:2502.17827  [pdf, other

    stat.ME

    DPGLM: A Semiparametric Bayesian GLM with Inhomogeneous Normalized Random Measures

    Authors: Entejar Alam, Paul J. Rathouz, Peter Mueller

    Abstract: We introduce a novel varying-weight dependent Dirichlet process (DDP) model that extends a recently developed semi-parametric generalized linear model (SPGLM) by adding a nonparametric Bayesian prior on the baseline distribution of the GLM. We show that the resulting model takes the form of an inhomogeneous completely random measure that arises from exponential tilting of a normalized completely r… ▽ More

    Submitted 28 March, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

  5. arXiv:2411.14674  [pdf, other

    stat.ME stat.AP stat.CO stat.ML

    Summarizing Bayesian Nonparametric Mixture Posterior -- Sliced Optimal Transport Metrics for Gaussian Mixtures

    Authors: Khai Nguyen, Peter Mueller

    Abstract: Existing methods to summarize posterior inference for mixture models focus on identifying a point estimate of the implied random partition for clustering, with density estimation as a secondary goal (Wade and Ghahramani, 2018; Dahl et al., 2022). We propose a novel approach for summarizing posterior inference in nonparametric Bayesian mixture models, prioritizing estimation of the mixing measure (… ▽ More

    Submitted 7 May, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

    Comments: 45 pages, 4 figures, 6 tables

  6. arXiv:2408.03560  [pdf, other

    cs.LG stat.ML

    In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models

    Authors: Ayrton San Joaquin, Bin Wang, Zhengyuan Liu, Nicholas Asher, Brian Lim, Philippe Muller, Nancy F. Chen

    Abstract: Despite advancements, fine-tuning Large Language Models (LLMs) remains costly due to the extensive parameter count and substantial data requirements for model generalization. Accessibility to computing resources remains a barrier for the open-source community. To address this challenge, we propose the In2Core algorithm, which selects a coreset by analyzing the correlation between training and eval… ▽ More

    Submitted 2 October, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: EMNLP 2024 - Findings

  7. arXiv:2406.15912  [pdf, other

    stat.ME

    Clustering and Meta-Analysis Using a Mixture of Dependent Linear Tail-Free Priors

    Authors: Bernardo Flores, Peter Mueller

    Abstract: We propose a novel nonparametric Bayesian approach for meta-analysis with event time outcomes. The model is an extension of linear dependent tail-free processes. The extension includes a modification to facilitate (conditionally) conjugate posterior updating and a hierarchical extension with a random partition of studies. The partition is formalized as a Dirichlet process mixture. The model develo… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  8. arXiv:2404.05060  [pdf, other

    stat.ME

    Dir-SPGLM: A Bayesian semiparametric GLM with data-driven reference distribution

    Authors: Entejar Alam, Peter Müller, Paul J. Rathouz

    Abstract: The recently developed semi-parametric generalized linear model (SPGLM) offers more flexibility as compared to the classical GLM by including the baseline or reference distribution of the response as an additional parameter in the model. However, some inference summaries are not easily generated under existing maximum-likelihood based inference (ML-SPGLM). This includes uncertainty in estimation f… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  9. arXiv:2312.06018  [pdf, other

    stat.ME

    A Multivariate Polya Tree Model for Meta-Analysis with Event Time Distributions

    Authors: Giovanni Poli, Elena Fountzilas, Apostolia-Maria Tsimeridou, Peter Müller

    Abstract: We develop a non-parametric Bayesian prior for a family of random probability measures by extending the Polya tree ($PT$) prior to a joint prior for a set of probability measures $G_1,\dots,G_n$, suitable for meta-analysis with event time outcomes. In the application to meta-analysis $G_i$ is the event time distribution specific to study $i$. The proposed model defines a regression on study-specif… ▽ More

    Submitted 8 October, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

  10. arXiv:2309.14120  [pdf, other

    math.ST stat.ME

    Regression with Variable Dimension Covariates

    Authors: Peter Mueller, Fernando Andrés Quintana, Garritt L. Page

    Abstract: Regression is one of the most fundamental statistical inference problems. A broad definition of regression problems is as estimation of the distribution of an outcome using a family of probability models indexed by covariates. Despite the ubiquitous nature of regression problems and the abundance of related methods and results there is a surprising gap in the literature. There are no well establis… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  11. arXiv:2306.08485  [pdf, other

    stat.ME

    Graph-Aligned Random Partition Model (GARP)

    Authors: Giovanni Rebaudo, Peter Mueller

    Abstract: Bayesian nonparametric mixtures and random partition models are powerful tools for probabilistic clustering. However, standard independent mixture models can be restrictive in some applications such as inference on cell lineage due to the biological relations of the clusters. The increasing availability of large genomic data requires new statistical tools to perform model-based clustering and infe… ▽ More

    Submitted 17 May, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Journal of the American Statistical Association 2024

  12. Predicting subgroup treatment effects for a new study: Motivations, results and learnings from running a data challenge in a pharmaceutical corporation

    Authors: Björn Bornkamp, Silvia Zaoli, Michela Azzarito, Ruvie Martin, Carsten Philipp Müller, Conor Moloney, Giulia Capestro, David Ohlssen, Mark Baillie

    Abstract: We present the motivation, experience and learnings from a data challenge conducted at a large pharmaceutical corporation on the topic of subgroup identification. The data challenge aimed at exploring approaches to subgroup identification for future clinical trials. To mimic a realistic setting, participants had access to 4 Phase III clinical trials to derive a subgroup and predict its treatment e… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Journal ref: Pharmaceutical Statistics 23, 495-510 (2024)

  13. arXiv:2208.10138  [pdf, other

    cs.GT stat.ML

    Learning Correlated Equilibria in Mean-Field Games

    Authors: Paul Muller, Romuald Elie, Mark Rowland, Mathieu Lauriere, Julien Perolat, Sarah Perrin, Matthieu Geist, Georgios Piliouras, Olivier Pietquin, Karl Tuyls

    Abstract: The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

  14. A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning

    Authors: Mauricio Tec, Yunshan Duan, Peter Müller

    Abstract: Reinforcement Learning (RL) is a computational approach to reward-driven learning in sequential decision problems. It implements the discovery of optimal actions by learning from an agent interacting with an environment rather than from supervised data. We contrast and compare RL with traditional sequential design, focusing on simulation-based Bayesian sequential design (BSD). Recently, there has… ▽ More

    Submitted 4 October, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

    Comments: The American Statistician (2022)

  15. arXiv:2203.11973  [pdf, other

    cs.LG math.OC stat.ML

    Scalable Deep Reinforcement Learning Algorithms for Mean Field Games

    Authors: Mathieu Laurière, Sarah Perrin, Sertan Girgin, Paul Muller, Ayush Jain, Theophile Cabannes, Georgios Piliouras, Julien Pérolat, Romuald Élie, Olivier Pietquin, Matthieu Geist

    Abstract: Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quant… ▽ More

    Submitted 17 June, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

  16. arXiv:2203.09438  [pdf, other

    cs.LG stat.ML

    An Explainable Stacked Ensemble Model for Static Route-Free Estimation of Time of Arrival

    Authors: Sören Schleibaum, Jörg P. Müller, Monika Sester

    Abstract: To compare alternative taxi schedules and to compute them, as well as to provide insights into an upcoming taxi trip to drivers and passengers, the duration of a trip or its Estimated Time of Arrival (ETA) is predicted. To reach a high prediction precision, machine learning models for ETA are state of the art. One yet unexploited option to further increase prediction precision is to combine multip… ▽ More

    Submitted 11 January, 2024; v1 submitted 17 March, 2022; originally announced March 2022.

  17. arXiv:2202.01163  [pdf, other

    stat.ME

    A Recommender System Based on a Double Feature Allocation Model

    Authors: Qiaohui Lin, Peter Mueller

    Abstract: A collaborative filtering recommender system predicts user preferences by discovering common features among users and items. We implement such inference using a Bayesian double feature allocation model, that is, a model for random pairs of subsets. We use an Indian buffet process (IBP) to link users and items to features. Here a feature is a subset of users and a matching subset of items. By train… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

  18. arXiv:2201.00068  [pdf, other

    stat.ME stat.AP

    Bayesian Nonparametric Common Atoms Regression for Generating Synthetic Controls in Clinical Trials

    Authors: Noirrit Kiran Chandra, Abhra Sarkar, John F. de Groot, Ying Yuan, Peter Müller

    Abstract: The availability of electronic health records (EHR) has opened opportunities to supplement increasingly expensive and difficult to carry out randomized controlled trials (RCT) with evidence from readily available real world data. In this paper, we use EHR data to construct synthetic control arms for treatment-only single arm trials. We propose a novel nonparametric Bayesian common atoms mixture mo… ▽ More

    Submitted 6 May, 2023; v1 submitted 31 December, 2021; originally announced January 2022.

  19. arXiv:2112.07755  [pdf, other

    stat.ME math.ST

    Separate Exchangeability as Modeling Principle in Bayesian Nonparametrics

    Authors: Giovanni Rebaudo, Qiaohui Lin, Peter Mueller

    Abstract: We argue for the use of separate exchangeability as a modeling principle in Bayesian nonparametric (BNP) inference. Separate exchangeability is \emph{de facto} widely applied in the Bayesian parametric case, e.g., it naturally arises in simple mixed models. However, while in some areas, such as random graphs, separate and (closely related) joint exchangeability are widely used, it is curiously und… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 December, 2021; originally announced December 2021.

  20. arXiv:2111.12244  [pdf, other

    stat.ME

    A Unified Decision Framework for Phase I Dose-Finding Designs

    Authors: Yunshan Duan, Shijie Yuan, Yuan Ji, Peter Mueller

    Abstract: The purpose of a phase I dose-finding clinical trial is to investigate the toxicity profiles of various doses for a new drug and identify the maximum tolerated dose. Over the past three decades, various dose-finding designs have been proposed and discussed, including conventional model-based designs, new model-based designs using toxicity probability intervals, and rule-based designs. We present a… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

  21. arXiv:2108.08439  [pdf, other

    stat.ME

    Bayesian Semiparametric Hidden Markov Tensor Partition Models for Longitudinal Data with Local Variable Selection

    Authors: Giorgio Paulon, Peter Müller, Abhra Sarkar

    Abstract: We present a flexible Bayesian semiparametric mixed model for longitudinal data analysis in the presence of potentially high-dimensional categorical covariates. Building on a novel hidden Markov tensor decomposition technique, our proposed method allows the fixed effects components to vary between dependent random partitions of the covariate space at different time points. The mechanism not only a… ▽ More

    Submitted 4 August, 2022; v1 submitted 18 August, 2021; originally announced August 2021.

  22. arXiv:2107.11316  [pdf, other

    stat.ME

    Bayesian Scalable Precision Factor Analysis for Massive Sparse Gaussian Graphical Models

    Authors: Noirrit Kiran Chandra, Peter Mueller, Abhra Sarkar

    Abstract: We propose a novel approach to estimating the precision matrix of multivariate Gaussian data that relies on decomposing them into a low-rank and a diagonal component. Such decompositions are very popular for modeling large covariance matrices as they admit a latent factor based representation that allows easy inference. The same is however not true for precision matrices due to the lack of computa… ▽ More

    Submitted 16 August, 2022; v1 submitted 23 July, 2021; originally announced July 2021.

  23. arXiv:2105.04451  [pdf, other

    stat.ME

    Search Algorithms and Loss Functions for Bayesian Clustering

    Authors: David B. Dahl, Devin J. Johnson, Peter Mueller

    Abstract: We propose a randomized greedy search algorithm to find a point estimate for a random partition based on a loss function and posterior Monte Carlo samples. Given the large size and awkward discrete nature of the search space, the minimization of the posterior expected loss is challenging. Our approach is a stochastic search based on a series of greedy optimizations performed in a random order and… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

  24. arXiv:2009.06460  [pdf, other

    stat.ME

    Bayesian Nonparametric Bivariate Survival Regression for Current Status Data

    Authors: Giorgio Paulon, Peter Müller, Victor G. Sal Y Rosas

    Abstract: We consider nonparametric inference for event time distributions based on current status data. We show that in this scenario conventional mixture priors, including the popular Dirichlet process mixture prior, lead to biologically uninterpretable results as they unnaturally skew the probability mass for the event times toward the extremes of the observed data. Simple assumptions on dependent censor… ▽ More

    Submitted 22 September, 2020; v1 submitted 14 September, 2020; originally announced September 2020.

  25. arXiv:2007.06129  [pdf, other

    stat.ME

    The Dependent Dirichlet Process and Related Models

    Authors: Fernand A. Quintana, Peter Mueller, Alejandro Jara, Steven N. MacEachern

    Abstract: Standard regression approaches assume that some finite number of the response distribution characteristics, such as location and scale, change as a (parametric or nonparametric) function of predictors. However, it is not always appropriate to assume a location/scale representation, where the error distribution has unchanging shape over the predictor space. In fact, it often happens in applied rese… ▽ More

    Submitted 12 July, 2020; originally announced July 2020.

    MSC Class: 62F15 ACM Class: G.3

  26. arXiv:1912.13119  [pdf, other

    stat.ME

    Clustering and Prediction with Variable Dimension Covariates

    Authors: Garritt L. Page, Fernando A. Quintana, Peter Müller

    Abstract: In many applied fields incomplete covariate vectors are commonly encountered. It is well known that this can be problematic when making inference on model parameters, but its impact on prediction performance is less understood. We develop a method based on covariate dependent partition models that seamlessly handles missing covariates while completely avoiding any type of imputation. The method we… ▽ More

    Submitted 12 July, 2020; v1 submitted 30 December, 2019; originally announced December 2019.

  27. arXiv:1910.12174  [pdf, other

    stat.AP

    A Semi-parametric Bayesian Approach to Population Finding with Time-to-Event and Toxicity Data in a Randomized Clinical Trial

    Authors: Satoshi Morita, Peter Müller, Hiroyasu Abe

    Abstract: A utility-based Bayesian population finding (BaPoFi) method was proposed by Morita and Müller (2017, Biometrics, 1355-1365) to analyze data from a randomized clinical trial with the aim of identifying good predictive baseline covariates for optimizing the target population for a future study. The approach casts the population finding process as a formal decision problem together with a flexible pr… ▽ More

    Submitted 26 October, 2019; originally announced October 2019.

    Comments: 25 pages, 4 figures

  28. InceptionTime: Finding AlexNet for Time Series Classification

    Authors: Hassan Ismail Fawaz, Benjamin Lucas, Germain Forestier, Charlotte Pelletier, Daniel F. Schmidt, Jonathan Weber, Geoffrey I. Webb, Lhassane Idoumghar, Pierre-Alain Muller, François Petitjean

    Abstract: This paper brings deep learning at the forefront of research into Time Series Classification (TSC). TSC is the area of machine learning tasked with the categorization (or labelling) of time series. The last few decades of work in this area have led to significant progress in the accuracy of classifiers, with the state of the art now represented by the HIVE-COTE algorithm. While extremely accurate,… ▽ More

    Submitted 5 December, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

  29. arXiv:1908.07319  [pdf, other

    cs.LG cs.AI stat.ML

    Accurate and interpretable evaluation of surgical skills from kinematic data using fully convolutional neural networks

    Authors: Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, Pierre-Alain Muller

    Abstract: Purpose: Manual feedback from senior surgeons observing less experienced trainees is a laborious task that is very expensive, time-consuming and prone to subjectivity. With the number of surgical procedures increasing annually, there is an unprecedented need to provide an accurate, objective and automatic evaluation of trainees' surgical skills in order to improve surgical practice. Methods: In th… ▽ More

    Submitted 20 August, 2019; originally announced August 2019.

    Comments: Accepted at IJCARS Special Issue for MICCAI 2018

  30. arXiv:1908.06756  [pdf, other

    cs.LG cs.AI stat.ML

    BOAH: A Tool Suite for Multi-Fidelity Bayesian Optimization & Analysis of Hyperparameters

    Authors: Marius Lindauer, Katharina Eggensperger, Matthias Feurer, André Biedenkapp, Joshua Marben, Philipp Müller, Frank Hutter

    Abstract: Hyperparameter optimization and neural architecture search can become prohibitively expensive for regular black-box Bayesian optimization because the training and evaluation of a single model can easily take several hours. To overcome this, we introduce a comprehensive tool suite for effective multi-fidelity Bayesian optimization and the analysis of its runs. The suite, written in Python, provides… ▽ More

    Submitted 16 August, 2019; originally announced August 2019.

  31. arXiv:1906.12309  [pdf, other

    stat.CO stat.ML

    Consensus Monte Carlo for Random Subsets using Shared Anchors

    Authors: Yang Ni, Yuan Ji, Peter Mueller

    Abstract: We present a consensus Monte Carlo algorithm that scales existing Bayesian nonparametric models for clustering and feature allocation to big data. The algorithm is valid for any prior on random subsets such as partitions and latent feature allocation, under essentially any sampling model. Motivated by three case studies, we focus on clustering induced by a Dirichlet process mixture sampling model,… ▽ More

    Submitted 25 February, 2020; v1 submitted 28 June, 2019; originally announced June 2019.

  32. arXiv:1904.07302  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Automatic alignment of surgical videos using kinematic data

    Authors: Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, François Petitjean, Lhassane Idoumghar, Pierre-Alain Muller

    Abstract: Over the past one hundred years, the classic teaching methodology of "see one, do one, teach one" has governed the surgical education systems worldwide. With the advent of Operation Room 2.0, recording video, kinematic and many other types of data during the surgery became an easy task, thus allowing artificial intelligence systems to be deployed and used in surgical and medical practice. Recently… ▽ More

    Submitted 26 April, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

    Comments: Accepted at AIME 2019

  33. arXiv:1903.08509  [pdf, other

    stat.ME

    A Bayesian Nonparametric Approach for Evaluating the Causal Effect of Treatment in Randomized Trials with Semi-Competing Risks

    Authors: Yanxun Xu, Daniel Scharfstein, Peter Müller, Michael Daniels

    Abstract: We develop a Bayesian nonparametric (BNP) approach to evaluate the causal effect of treatment in a randomized trial where a nonterminal event may be censored by a terminal event, but not vice versa (i.e., semi-competing risks). Based on the idea of principal stratification, we define a novel estimand for the causal effect of treatment on the nonterminal event. We introduce identification assumptio… ▽ More

    Submitted 21 July, 2019; v1 submitted 20 March, 2019; originally announced March 2019.

  34. arXiv:1903.07054  [pdf, other

    cs.LG cs.CR stat.ML

    Adversarial Attacks on Deep Neural Networks for Time Series Classification

    Authors: Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, Pierre-Alain Muller

    Abstract: Time Series Classification (TSC) problems are encountered in many real life data mining tasks ranging from medicine and security to human activity recognition and food safety. With the recent success of deep neural networks in various domains such as computer vision and natural language processing, researchers started adopting these techniques for solving time series data mining problems. However,… ▽ More

    Submitted 26 April, 2019; v1 submitted 17 March, 2019; originally announced March 2019.

    Comments: Accepted at IJCNN 2019

  35. Deep Neural Network Ensembles for Time Series Classification

    Authors: Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, Pierre-Alain Muller

    Abstract: Deep neural networks have revolutionized many fields such as computer vision and natural language processing. Inspired by this recent success, deep learning started to show promising results for Time Series Classification (TSC). However, neural networks are still behind the state-of-the-art TSC algorithms, that are currently composed of ensembles of 37 non deep learning based classifiers. We attri… ▽ More

    Submitted 26 April, 2019; v1 submitted 15 March, 2019; originally announced March 2019.

    Comments: Accepted at IJCNN 2019

  36. Transfer learning for time series classification

    Authors: Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, Pierre-Alain Muller

    Abstract: Transfer learning for deep neural networks is the process of first training a base network on a source dataset, and then transferring the learned features (the network's weights) to a second network to be trained on a target dataset. This idea has been shown to improve deep neural network's generalization capabilities in many computer vision tasks such as image recognition and object localization.… ▽ More

    Submitted 5 November, 2018; originally announced November 2018.

    Comments: Accepted at IEEE International Conference on Big Data 2018

  37. arXiv:1809.08988  [pdf, other

    stat.AP

    Bayesian Double Feature Allocation for Phenotyping with Electronic Health Records

    Authors: Yang Ni, Peter Mueller, Yuan Ji

    Abstract: We propose a categorical matrix factorization method to infer latent diseases from electronic health records (EHR) data in an unsupervised manner. A latent disease is defined as an unknown biological aberration that causes a set of common symptoms for a group of patients. The proposed approach is based on a novel double feature allocation model which simultaneously allocates features to the rows a… ▽ More

    Submitted 13 February, 2019; v1 submitted 4 September, 2018; originally announced September 2018.

    Comments: 32 pages, 8 figures, 1 table

  38. arXiv:1809.04356  [pdf, other

    cs.LG cs.AI stat.ML

    Deep learning for time series classification: a review

    Authors: Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, Pierre-Alain Muller

    Abstract: Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revo… ▽ More

    Submitted 14 May, 2019; v1 submitted 12 September, 2018; originally announced September 2018.

    Comments: Accepted at Data Mining and Knowledge Discovery

  39. arXiv:1806.02670  [pdf, other

    stat.CO stat.ME

    Scalable Bayesian Nonparametric Clustering and Classification

    Authors: Yang Ni, Peter Müller, Maurice Diesendruck, Sinead Williamson, Yitan Zhu, Yuan Ji

    Abstract: We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is "embarrassingly parallel" and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable t… ▽ More

    Submitted 7 June, 2018; originally announced June 2018.

    Comments: 29 pages, 3 figures, 2 tables

  40. arXiv:1707.00842  [pdf, other

    stat.ME

    Discussions of the paper "Sparse graphs using exchangeable random measures" by F. Caron and E. B. Fox

    Authors: Julyan Arbel, Marco Battiston, Stefano Favaro, Antonio Lijoi, Igor Prünster, Ramsés H. Mena, Yang Ni, Peter Müller

    Abstract: These are written discussions of the paper "Sparse graphs using exchangeable random measures" by François Caron and Emily B. Fox, contributed to the Journal of the Royal Statistical Society Series B.

    Submitted 4 July, 2017; originally announced July 2017.

    Comments: To be published in the Journal of the Royal Statistical Society, Series B, volume 79. 4 pages, 1 figure

  41. arXiv:1703.06043  [pdf, other

    q-bio.NC cs.NE stat.ML

    Pattern representation and recognition with accelerated analog neuromorphic systems

    Authors: Mihai A. Petrovici, Sebastian Schmitt, Johann Klähn, David Stöckel, Anna Schroeder, Guillaume Bellec, Johannes Bill, Oliver Breitwieser, Ilja Bytschok, Andreas Grübl, Maurice Güttler, Andreas Hartel, Stephan Hartmann, Dan Husmann, Kai Husmann, Sebastian Jeltsch, Vitali Karasenko, Mitja Kleider, Christoph Koke, Alexander Kononov, Christian Mauch, Eric Müller, Paul Müller, Johannes Partzsch, Thomas Pfeil , et al. (11 additional authors not shown)

    Abstract: Despite being originally inspired by the central nervous system, artificial neural networks have diverged from their biological archetypes as they have been remodeled to fit particular tasks. In this paper, we review several possibilites to reverse map these architectures to biologically more realistic spiking networks with the aim of emulating them on fast, low-power neuromorphic hardware. Since… ▽ More

    Submitted 3 July, 2017; v1 submitted 17 March, 2017; originally announced March 2017.

    Comments: accepted at ISCAS 2017

    Journal ref: Circuits and Systems (ISCAS), 2017 IEEE International Symposium on

  42. arXiv:1703.03853  [pdf, other

    stat.AP

    TreeClone: Reconstruction of Tumor Subclone Phylogeny Based on Mutation Pairs using Next Generation Sequencing Data

    Authors: Tianjian Zhou, Subhajit Sengupta, Peter Mueller, Yuan Ji

    Abstract: We present TreeClone, a latent feature allocation model to reconstruct tumor subclones subject to phylogenetic evolution that mimics tumor evolution. Similar to most current methods, we consider data from next-generation sequencing of tumor DNA. Unlike most methods that use information in short reads mapped to single nucleotide variants (SNVs), we consider subclone phylogeny reconstruction using p… ▽ More

    Submitted 25 October, 2017; v1 submitted 10 March, 2017; originally announced March 2017.

  43. PairClone: A Bayesian Subclone Caller Based on Mutation Pairs

    Authors: Tianjian Zhou, Peter Mueller, Subhajit Sengupta, Yuan Ji

    Abstract: Tumor cell populations can be thought of as being composed of homogeneous cell subpopulations, with each subpopulation being characterized by overlapping sets of single nucleotide variants (SNVs). Such subpopulations are known as subclones and are an important target for precision medicine. Reconstructing such subclones from next-generation sequencing (NGS) data is one of the major challenges in p… ▽ More

    Submitted 24 February, 2017; originally announced February 2017.

    Journal ref: Journal of the Royal Statistical Society: Series C (Applied Statistics), 2019

  44. arXiv:1612.06045  [pdf, other

    stat.ME

    Heterogeneous Reciprocal Graphical Models

    Authors: Yang Ni, Peter Mueller, Yitan Zhu, Yuan Ji

    Abstract: We develop novel hierarchical reciprocal graphical models to infer gene networks from heterogeneous data. In the case of data that can be naturally divided into known groups, we propose to connect graphs by introducing a hierarchical prior across group-specific graphs, including a correlation on edge strengths across graphs. Thresholding priors are applied to induce sparsity of the estimated netwo… ▽ More

    Submitted 21 January, 2018; v1 submitted 18 December, 2016; originally announced December 2016.

  45. arXiv:1612.02705  [pdf, other

    stat.AP

    A Nonparametric Bayesian Basket Trial Design

    Authors: Yanxun Xu, Peter Mueller, Apostolia M Tsimberidou, Donald Berry

    Abstract: Targeted therapies on the basis of genomic aberrations analysis of the tumor have shown promising results in cancer prognosis and treatment. Regardless of tumor type, trials that match patients to targeted therapies for their particular genomic aberrations have become a mainstream direction of therapeutic management of patients with cancer. Therefore, finding the subpopulation of patients who can… ▽ More

    Submitted 17 April, 2018; v1 submitted 8 December, 2016; originally announced December 2016.

  46. arXiv:1607.06849  [pdf, other

    stat.ME

    Reciprocal Graphical Models for Integrative Gene Regulatory Network Analysis

    Authors: Yang Ni, Yuan Ji, Peter Mueller

    Abstract: Constructing gene regulatory networks is a fundamental task in systems biology. We introduce a Gaussian reciprocal graphical model for inference about gene regulatory relationships by integrating mRNA gene expression and DNA level information including copy number and methylation. Data integration allows for inference on the directionality of certain regulatory relationships, which would be otherw… ▽ More

    Submitted 22 July, 2016; originally announced July 2016.

    Comments: 20 pages, 6 figures, 1 table

  47. arXiv:1509.04026  [pdf, ps, other

    stat.AP q-bio.GN q-bio.PE

    A Bayesian feature allocation model for tumor heterogeneity

    Authors: Juhee Lee, Peter Müller, Kamalakar Gulukota, Yuan Ji

    Abstract: We develop a feature allocation model for inference on genetic tumor variation using next-generation sequencing data. Specifically, we record single nucleotide variants (SNVs) based on short reads mapped to human reference genome and characterize tumor heterogeneity by latent haplotypes defined as a scaffold of SNVs on the same homologous genome. For multiple samples from a single tumor, assuming… ▽ More

    Submitted 14 September, 2015; originally announced September 2015.

    Comments: Published at http://dx.doi.org/10.1214/15-AOAS817 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS817

    Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 2, 621-639

  48. arXiv:1506.08253  [pdf, other

    stat.ME stat.AP

    Bayesian Inference for Latent Biologic Structure with Determinantal Point Processes (DPP)

    Authors: Yanxun Xu, Peter Mueller, Donatello Telesca

    Abstract: We discuss the use of the determinantal point process (DPP) as a prior for latent structure in biomedical applications, where inference often centers on the interpretation of latent features as biologically or clinically meaningful structure. Typical examples include mixture models, when the terms of the mixture are meant to represent clinically meaningful subpopulations (of patients, genes, etc.)… ▽ More

    Submitted 16 November, 2015; v1 submitted 26 June, 2015; originally announced June 2015.

  49. arXiv:1506.07687  [pdf, other

    stat.AP

    A Decision-Theoretic Comparison of Treatments to Resolve Air Leaks After Lung Surgery Based on Nonparametric Modeling

    Authors: Yanxun Xu, Peter F. Thall, Peter Mueller, Mehran J. Reza

    Abstract: We propose a Bayesian nonparametric utility-based group sequential design for a randomized clinical trial to compare a gel sealant to standard care for resolving air leaks after pulmonary resection. Clinically, resolving air leaks in the days soon after surgery is highly important, since longer resolution time produces undesirable complications that require extended hospitalization. The problem of… ▽ More

    Submitted 18 July, 2016; v1 submitted 25 June, 2015; originally announced June 2015.

  50. arXiv:1409.7158  [pdf, other

    stat.ME q-bio.GN

    Bayesian Inference for Tumor Subclones Accounting for Sequencing and Structural Variants

    Authors: Juhee Lee, Peter Mueller, Subhajit Sengupta, Kamalakar Gulukota, Yuan Ji

    Abstract: Tumor samples are heterogeneous. They consist of different subclones that are characterized by differences in DNA nucleotide sequences and copy numbers on multiple loci. Heterogeneity can be measured through the identification of the subclonal copy number and sequence at a selected set of loci. Understanding that the accurate identification of variant allele fractions greatly depends on a precise… ▽ More

    Submitted 25 September, 2014; originally announced September 2014.

    Comments: 26 pages, 11 figures