-
Leveraging External Data for Testing Experimental Therapies with Biomarker Interactions in Randomized Clinical Trials
Authors:
Boyu Ren,
Federico Ferrari,
Sandra Fortini,
Steffen Ventz,
Lorenzo Trippa
Abstract:
In oncology the efficacy of novel therapeutics often differs across patient subgroups, and these variations are difficult to predict during the initial phases of the drug development process. The relation between the power of randomized clinical trials and heterogeneous treatment effects has been discussed by several authors. In particular, false negative results are likely to occur when the treat…
▽ More
In oncology the efficacy of novel therapeutics often differs across patient subgroups, and these variations are difficult to predict during the initial phases of the drug development process. The relation between the power of randomized clinical trials and heterogeneous treatment effects has been discussed by several authors. In particular, false negative results are likely to occur when the treatment effects concentrate in a subpopulation but the study design did not account for potential heterogeneous treatment effects. The use of external data from completed clinical studies and electronic health records has the potential to improve decision-making throughout the development of new therapeutics, from early-stage trials to registration. Here we discuss the use of external data to evaluate experimental treatments with potential heterogeneous treatment effects. We introduce a permutation procedure to test, at the completion of a randomized clinical trial, the null hypothesis that the experimental therapy does not improve the primary outcomes in any subpopulation. The permutation test leverages the available external data to increase power. Also, the procedure controls the false positive rate at the desired $α$-level without restrictive assumptions on the external data, for example, in scenarios with unmeasured confounders, different pre-treatment patient profiles in the trial population compared to the external data, and other discrepancies between the trial and the external data. We illustrate that the permutation test is optimal according to an interpretable criteria and discuss examples based on asymptotic results and simulations, followed by a retrospective analysis of individual patient-level data from a collection of glioblastoma clinical trials.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Bayesian Joint Additive Factor Models for Multiview Learning
Authors:
Niccolo Anceschi,
Federico Ferrari,
David B. Dunson,
Himel Mallick
Abstract:
It is increasingly common in a wide variety of applied settings to collect data of multiple different types on the same set of samples. Our particular focus in this article is on studying relationships between such multiview features and responses. A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes. It is of…
▽ More
It is increasingly common in a wide variety of applied settings to collect data of multiple different types on the same set of samples. Our particular focus in this article is on studying relationships between such multiview features and responses. A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes. It is of interest to infer dependence within and across views while combining multimodal information to improve the prediction of outcomes. The signal-to-noise ratio can vary substantially across views, motivating more nuanced statistical tools beyond standard late and early fusion. This challenge comes with the need to preserve interpretability, select features, and obtain accurate uncertainty quantification. We propose a joint additive factor regression model (JAFAR) with a structured additive design, accounting for shared and view-specific components. We ensure identifiability via a novel dependent cumulative shrinkage process (D-CUSP) prior. We provide an efficient implementation via a partially collapsed Gibbs sampler and extend our approach to allow flexible feature and outcome distributions. Prediction of time-to-labor onset from immunome, metabolome, and proteome data illustrates performance gains against state-of-the-art competitors. Our open-source software (R package) is available at https://github.com/niccoloanceschi/jafar.
△ Less
Submitted 10 January, 2025; v1 submitted 2 June, 2024;
originally announced June 2024.
-
Increasing the skill of short-term wind speed ensemble forecasts combining forecasts and observations via a new dynamic calibration
Authors:
Gabriele Casciaro,
Francesco Ferrari,
Daniele Lagomarsino Oneto,
Andrea Lira-Loarca,
Andrea Mazzino
Abstract:
All numerical weather prediction models used for the wind industry need to produce their forecasts starting from the main synoptic hours 00, 06, 12, and 18 UTC, once the analysis becomes available. The six-hour latency time between two consecutive model runs calls for strategies to fill the gap by providing new accurate predictions having, at least, hourly frequency. This is done to accommodate th…
▽ More
All numerical weather prediction models used for the wind industry need to produce their forecasts starting from the main synoptic hours 00, 06, 12, and 18 UTC, once the analysis becomes available. The six-hour latency time between two consecutive model runs calls for strategies to fill the gap by providing new accurate predictions having, at least, hourly frequency. This is done to accommodate the request of frequent, accurate and fresh information from traders and system regulators to continuously adapt their work strategies. Here, we propose a strategy where quasi-real time observed wind speed and weather model predictions are combined by means of a novel Ensemble Model Output Statistics (EMOS) strategy. The success of our strategy is measured by comparisons against observed wind speed from SYNOP stations over Italy in the years 2018 and 2019.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Efficiently resolving rotational ambiguity in Bayesian matrix sampling with matching
Authors:
Evan Poworoznek,
Niccolo Anceschi,
Federico Ferrari,
David Dunson
Abstract:
A wide class of Bayesian models involve unidentifiable random matrices that display rotational ambiguity, with the Gaussian factor model being a typical example. A rich variety of Markov chain Monte Carlo (MCMC) algorithms have been proposed for sampling the parameters of these models. However, without identifiability constraints, reliable posterior summaries of the parameters cannot be obtained d…
▽ More
A wide class of Bayesian models involve unidentifiable random matrices that display rotational ambiguity, with the Gaussian factor model being a typical example. A rich variety of Markov chain Monte Carlo (MCMC) algorithms have been proposed for sampling the parameters of these models. However, without identifiability constraints, reliable posterior summaries of the parameters cannot be obtained directly from the MCMC output. As an alternative, we propose a computationally efficient post-processing algorithm that allows inference on non-identifiable parameters. We first orthogonalize the posterior samples using Varimax and then tackle label and sign switching with a greedy matching algorithm. We compare the performance and computational complexity with other methods using a simulation study and chemical exposures data. The algorithm implementation is available in the infinitefactor R package on CRAN.
△ Less
Submitted 15 August, 2024; v1 submitted 29 July, 2021;
originally announced July 2021.
-
Identifying main effects and interactions among exposures using Gaussian processes
Authors:
Federico Ferrari,
David B. Dunson
Abstract:
This article is motivated by the problem of studying the joint effect of different chemical exposures on human health outcomes. This is essentially a nonparametric regression problem, with interest being focused not on a black box for prediction but instead on selection of main effects and interactions. For interpretability, we decompose the expected health outcome into a linear main effect, pairw…
▽ More
This article is motivated by the problem of studying the joint effect of different chemical exposures on human health outcomes. This is essentially a nonparametric regression problem, with interest being focused not on a black box for prediction but instead on selection of main effects and interactions. For interpretability, we decompose the expected health outcome into a linear main effect, pairwise interactions, and a non-linear deviation. Our interest is in model selection for these different components, accounting for uncertainty and addressing non-identifability between the linear and nonparametric components of the semiparametric model. We propose a Bayesian approach to inference, placing variable selection priors on the different components, and developing a Markov chain Monte Carlo (MCMC) algorithm. A key component of our approach is the incorporation of a heredity constraint to only include interactions in the presence of main effects, effectively reducing dimensionality of the model search. We adapt a projection approach developed in the spatial statistics literature to enforce identifiability in modeling the nonparametric component using a Gaussian process. We also employ a dimension reduction strategy to sample the non-linear random effects that aids the mixing of the MCMC algorithm. The proposed MixSelect framework is evaluated using a simulation study, and is illustrated using data from the National Health and Nutrition Examination Survey (NHANES). Code is available on GitHub.
△ Less
Submitted 16 April, 2020; v1 submitted 5 November, 2019;
originally announced November 2019.
-
Nonparametric Bayesian multi-armed bandits for single cell experiment design
Authors:
Federico Camerlenghi,
Bianca Dumitrascu,
Federico Ferrari,
Barbara E. Engelhardt,
Stefano Favaro
Abstract:
The problem of maximizing cell type discovery under budget constraints is a fundamental challenge for the collection and analysis of single-cell RNA-sequencing (scRNA-seq) data. In this paper, we introduce a simple, computationally efficient, and scalable Bayesian nonparametric sequential approach to optimize the budget allocation when designing a large scale experiment for the collection of scRNA…
▽ More
The problem of maximizing cell type discovery under budget constraints is a fundamental challenge for the collection and analysis of single-cell RNA-sequencing (scRNA-seq) data. In this paper, we introduce a simple, computationally efficient, and scalable Bayesian nonparametric sequential approach to optimize the budget allocation when designing a large scale experiment for the collection of scRNA-seq data for the purpose of, but not limited to, creating cell atlases. Our approach relies on the following tools: i) a hierarchical Pitman-Yor prior that recapitulates biological assumptions regarding cellular differentiation, and ii) a Thompson sampling multi-armed bandit strategy that balances exploitation and exploration to prioritize experiments across a sequence of trials. Posterior inference is performed by using a sequential Monte Carlo approach, which allows us to fully exploit the sequential nature of our species sampling problem. We empirically show that our approach outperforms state-of-the-art methods and achieves near-Oracle performance on simulated and scRNA-seq data alike. HPY-TS code is available at https://github.com/fedfer/HPYsinglecell.
△ Less
Submitted 20 September, 2020; v1 submitted 11 October, 2019;
originally announced October 2019.
-
Bayesian Factor Analysis for Inference on Interactions
Authors:
Federico Ferrari,
David B Dunson
Abstract:
This article is motivated by the problem of inference on interactions among chemical exposures impacting human health outcomes. Chemicals often co-occur in the environment or in synthetic mixtures and as a result exposure levels can be highly correlated. We propose a latent factor joint model, which includes shared factors in both the predictor and response components while assuming conditional in…
▽ More
This article is motivated by the problem of inference on interactions among chemical exposures impacting human health outcomes. Chemicals often co-occur in the environment or in synthetic mixtures and as a result exposure levels can be highly correlated. We propose a latent factor joint model, which includes shared factors in both the predictor and response components while assuming conditional independence. By including a quadratic regression in the latent variables in the response component, we induce flexible dimension reduction in characterizing main effects and interactions. We propose a Bayesian approach to inference under this Factor analysis for INteractions (FIN) framework. Through appropriate modifications of the factor modeling structure, FIN can accommodate higher order interactions and multivariate outcomes. We provide theory on posterior consistency and the impact of misspecifying the number of factors. We evaluate the performance using a simulation study and data from the National Health and Nutrition Examination Survey (NHANES). Code is available on GitHub.
△ Less
Submitted 8 January, 2020; v1 submitted 25 April, 2019;
originally announced April 2019.