Search | arXiv e-print repository

Leveraging External Data for Testing Experimental Therapies with Biomarker Interactions in Randomized Clinical Trials

Authors: Boyu Ren, Federico Ferrari, Sandra Fortini, Steffen Ventz, Lorenzo Trippa

Abstract: In oncology the efficacy of novel therapeutics often differs across patient subgroups, and these variations are difficult to predict during the initial phases of the drug development process. The relation between the power of randomized clinical trials and heterogeneous treatment effects has been discussed by several authors. In particular, false negative results are likely to occur when the treat… ▽ More In oncology the efficacy of novel therapeutics often differs across patient subgroups, and these variations are difficult to predict during the initial phases of the drug development process. The relation between the power of randomized clinical trials and heterogeneous treatment effects has been discussed by several authors. In particular, false negative results are likely to occur when the treatment effects concentrate in a subpopulation but the study design did not account for potential heterogeneous treatment effects. The use of external data from completed clinical studies and electronic health records has the potential to improve decision-making throughout the development of new therapeutics, from early-stage trials to registration. Here we discuss the use of external data to evaluate experimental treatments with potential heterogeneous treatment effects. We introduce a permutation procedure to test, at the completion of a randomized clinical trial, the null hypothesis that the experimental therapy does not improve the primary outcomes in any subpopulation. The permutation test leverages the available external data to increase power. Also, the procedure controls the false positive rate at the desired $α$-level without restrictive assumptions on the external data, for example, in scenarios with unmeasured confounders, different pre-treatment patient profiles in the trial population compared to the external data, and other discrepancies between the trial and the external data. We illustrate that the permutation test is optimal according to an interpretable criteria and discuss examples based on asymptotic results and simulations, followed by a retrospective analysis of individual patient-level data from a collection of glioblastoma clinical trials. △ Less

Submitted 4 June, 2025; originally announced June 2025.

arXiv:2406.00778 [pdf, other]

Bayesian Joint Additive Factor Models for Multiview Learning

Authors: Niccolo Anceschi, Federico Ferrari, David B. Dunson, Himel Mallick

Abstract: It is increasingly common in a wide variety of applied settings to collect data of multiple different types on the same set of samples. Our particular focus in this article is on studying relationships between such multiview features and responses. A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes. It is of… ▽ More It is increasingly common in a wide variety of applied settings to collect data of multiple different types on the same set of samples. Our particular focus in this article is on studying relationships between such multiview features and responses. A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes. It is of interest to infer dependence within and across views while combining multimodal information to improve the prediction of outcomes. The signal-to-noise ratio can vary substantially across views, motivating more nuanced statistical tools beyond standard late and early fusion. This challenge comes with the need to preserve interpretability, select features, and obtain accurate uncertainty quantification. We propose a joint additive factor regression model (JAFAR) with a structured additive design, accounting for shared and view-specific components. We ensure identifiability via a novel dependent cumulative shrinkage process (D-CUSP) prior. We provide an efficient implementation via a partially collapsed Gibbs sampler and extend our approach to allow flexible feature and outcome distributions. Prediction of time-to-labor onset from immunome, metabolome, and proteome data illustrates performance gains against state-of-the-art competitors. Our open-source software (R package) is available at https://github.com/niccoloanceschi/jafar. △ Less

Submitted 10 January, 2025; v1 submitted 2 June, 2024; originally announced June 2024.

MSC Class: 62F15

arXiv:2201.12234 [pdf, other]

Increasing the skill of short-term wind speed ensemble forecasts combining forecasts and observations via a new dynamic calibration

Authors: Gabriele Casciaro, Francesco Ferrari, Daniele Lagomarsino Oneto, Andrea Lira-Loarca, Andrea Mazzino

Abstract: All numerical weather prediction models used for the wind industry need to produce their forecasts starting from the main synoptic hours 00, 06, 12, and 18 UTC, once the analysis becomes available. The six-hour latency time between two consecutive model runs calls for strategies to fill the gap by providing new accurate predictions having, at least, hourly frequency. This is done to accommodate th… ▽ More All numerical weather prediction models used for the wind industry need to produce their forecasts starting from the main synoptic hours 00, 06, 12, and 18 UTC, once the analysis becomes available. The six-hour latency time between two consecutive model runs calls for strategies to fill the gap by providing new accurate predictions having, at least, hourly frequency. This is done to accommodate the request of frequent, accurate and fresh information from traders and system regulators to continuously adapt their work strategies. Here, we propose a strategy where quasi-real time observed wind speed and weather model predictions are combined by means of a novel Ensemble Model Output Statistics (EMOS) strategy. The success of our strategy is measured by comparisons against observed wind speed from SYNOP stations over Italy in the years 2018 and 2019. △ Less

Submitted 28 January, 2022; originally announced January 2022.

arXiv:2107.13783 [pdf, other]

Efficiently resolving rotational ambiguity in Bayesian matrix sampling with matching

Authors: Evan Poworoznek, Niccolo Anceschi, Federico Ferrari, David Dunson

Abstract: A wide class of Bayesian models involve unidentifiable random matrices that display rotational ambiguity, with the Gaussian factor model being a typical example. A rich variety of Markov chain Monte Carlo (MCMC) algorithms have been proposed for sampling the parameters of these models. However, without identifiability constraints, reliable posterior summaries of the parameters cannot be obtained d… ▽ More A wide class of Bayesian models involve unidentifiable random matrices that display rotational ambiguity, with the Gaussian factor model being a typical example. A rich variety of Markov chain Monte Carlo (MCMC) algorithms have been proposed for sampling the parameters of these models. However, without identifiability constraints, reliable posterior summaries of the parameters cannot be obtained directly from the MCMC output. As an alternative, we propose a computationally efficient post-processing algorithm that allows inference on non-identifiable parameters. We first orthogonalize the posterior samples using Varimax and then tackle label and sign switching with a greedy matching algorithm. We compare the performance and computational complexity with other methods using a simulation study and chemical exposures data. The algorithm implementation is available in the infinitefactor R package on CRAN. △ Less

Submitted 15 August, 2024; v1 submitted 29 July, 2021; originally announced July 2021.

arXiv:1911.01910 [pdf, other]

Identifying main effects and interactions among exposures using Gaussian processes

Authors: Federico Ferrari, David B. Dunson

Abstract: This article is motivated by the problem of studying the joint effect of different chemical exposures on human health outcomes. This is essentially a nonparametric regression problem, with interest being focused not on a black box for prediction but instead on selection of main effects and interactions. For interpretability, we decompose the expected health outcome into a linear main effect, pairw… ▽ More This article is motivated by the problem of studying the joint effect of different chemical exposures on human health outcomes. This is essentially a nonparametric regression problem, with interest being focused not on a black box for prediction but instead on selection of main effects and interactions. For interpretability, we decompose the expected health outcome into a linear main effect, pairwise interactions, and a non-linear deviation. Our interest is in model selection for these different components, accounting for uncertainty and addressing non-identifability between the linear and nonparametric components of the semiparametric model. We propose a Bayesian approach to inference, placing variable selection priors on the different components, and developing a Markov chain Monte Carlo (MCMC) algorithm. A key component of our approach is the incorporation of a heredity constraint to only include interactions in the presence of main effects, effectively reducing dimensionality of the model search. We adapt a projection approach developed in the spatial statistics literature to enforce identifiability in modeling the nonparametric component using a Gaussian process. We also employ a dimension reduction strategy to sample the non-linear random effects that aids the mixing of the MCMC algorithm. The proposed MixSelect framework is evaluated using a simulation study, and is illustrated using data from the National Health and Nutrition Examination Survey (NHANES). Code is available on GitHub. △ Less

Submitted 16 April, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

arXiv:1910.05355 [pdf, other]

Nonparametric Bayesian multi-armed bandits for single cell experiment design

Authors: Federico Camerlenghi, Bianca Dumitrascu, Federico Ferrari, Barbara E. Engelhardt, Stefano Favaro

Abstract: The problem of maximizing cell type discovery under budget constraints is a fundamental challenge for the collection and analysis of single-cell RNA-sequencing (scRNA-seq) data. In this paper, we introduce a simple, computationally efficient, and scalable Bayesian nonparametric sequential approach to optimize the budget allocation when designing a large scale experiment for the collection of scRNA… ▽ More The problem of maximizing cell type discovery under budget constraints is a fundamental challenge for the collection and analysis of single-cell RNA-sequencing (scRNA-seq) data. In this paper, we introduce a simple, computationally efficient, and scalable Bayesian nonparametric sequential approach to optimize the budget allocation when designing a large scale experiment for the collection of scRNA-seq data for the purpose of, but not limited to, creating cell atlases. Our approach relies on the following tools: i) a hierarchical Pitman-Yor prior that recapitulates biological assumptions regarding cellular differentiation, and ii) a Thompson sampling multi-armed bandit strategy that balances exploitation and exploration to prioritize experiments across a sequence of trials. Posterior inference is performed by using a sequential Monte Carlo approach, which allows us to fully exploit the sequential nature of our species sampling problem. We empirically show that our approach outperforms state-of-the-art methods and achieves near-Oracle performance on simulated and scRNA-seq data alike. HPY-TS code is available at https://github.com/fedfer/HPYsinglecell. △ Less

Submitted 20 September, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

arXiv:1904.11603 [pdf, other]

Bayesian Factor Analysis for Inference on Interactions

Authors: Federico Ferrari, David B Dunson

Abstract: This article is motivated by the problem of inference on interactions among chemical exposures impacting human health outcomes. Chemicals often co-occur in the environment or in synthetic mixtures and as a result exposure levels can be highly correlated. We propose a latent factor joint model, which includes shared factors in both the predictor and response components while assuming conditional in… ▽ More This article is motivated by the problem of inference on interactions among chemical exposures impacting human health outcomes. Chemicals often co-occur in the environment or in synthetic mixtures and as a result exposure levels can be highly correlated. We propose a latent factor joint model, which includes shared factors in both the predictor and response components while assuming conditional independence. By including a quadratic regression in the latent variables in the response component, we induce flexible dimension reduction in characterizing main effects and interactions. We propose a Bayesian approach to inference under this Factor analysis for INteractions (FIN) framework. Through appropriate modifications of the factor modeling structure, FIN can accommodate higher order interactions and multivariate outcomes. We provide theory on posterior consistency and the impact of misspecifying the number of factors. We evaluate the performance using a simulation study and data from the National Health and Nutrition Examination Survey (NHANES). Code is available on GitHub. △ Less

Submitted 8 January, 2020; v1 submitted 25 April, 2019; originally announced April 2019.

Showing 1–7 of 7 results for author: Ferrari, F