-
Semi-Markov multistate modeling approaches for multicohort event history data
Authors:
Xavier Piulachs,
Klaus Langohr,
Mireia Besalú,
Natalia Pallarès,
Jordi Carratalà,
Cristian Tebé,
Guadalupe Gómez Melis
Abstract:
Two Cox-based multistate modeling approaches are compared for analyzing a complex multicohort event history process. The first approach incorporates cohort information as a fixed covariate, thereby providing a direct estimation of the cohort-specific effects. The second approach includes the cohort as stratum variable, thus giving an extra flexibility in estimating the transition probabilities. Ad…
▽ More
Two Cox-based multistate modeling approaches are compared for analyzing a complex multicohort event history process. The first approach incorporates cohort information as a fixed covariate, thereby providing a direct estimation of the cohort-specific effects. The second approach includes the cohort as stratum variable, thus giving an extra flexibility in estimating the transition probabilities. Additionally, both approaches may include possible interaction terms between the cohort and a given prognostic predictor. Furthermore, the Markov property conditional on observed prognostic covariates is assessed using a global score test. Whenever departures from the Markovian assumption are revealed for a given transition, the time of entry into the current state is incorporated as a fixed covariate, yielding a semi-Markov process. The two proposed methods are applied to a three-wave dataset of COVID-19-hospitalized adults in the southern Barcelona metropolitan area (Spain), and the corresponding performance is discussed. While both semi-Markovian approaches are shown to be useful, the preferred one will depend on the focus of the inference. To summarize, the cohort-covariate approach enables an insightful discussion on the the behavior of the cohort effects, whereas the stratum-cohort approach provides flexibility to estimate transition-specific underlying risks according with the different cohorts
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Design of Trials with Composite Endpoints with the R Package CompAREdesign
Authors:
Jordi Cortés Martinez,
Marta Bofill Roig,
Guadalupe Gómez Melis
Abstract:
Composite endpoints are widely used as primary endpoints in clinical trials. Designing trials with time-to-event endpoints can be particularly challenging because the proportional hazard assumption usually does not hold when using a composite endpoint, even when the premise remains true for their components. Consequently, the conventional formulae for sample size calculation do not longer apply. W…
▽ More
Composite endpoints are widely used as primary endpoints in clinical trials. Designing trials with time-to-event endpoints can be particularly challenging because the proportional hazard assumption usually does not hold when using a composite endpoint, even when the premise remains true for their components. Consequently, the conventional formulae for sample size calculation do not longer apply. We present the R package CompAREdesign by means of which the key elements of trial designs, such as the sample size and effect sizes, can be computed based on the information on the composite endpoint components. CompAREdesign provides the functions to assess the sensitivity and robustness of design calculations to variations in initial values and assumptions. Furthermore, we describe other features of the package, such as functions for the design of trials with binary composite endpoints, and functions to simulate trials with composite endpoints under a wide range of scenarios.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization
Authors:
Gábor Melis
Abstract:
Tail Averaging improves on Polyak averaging's non-asymptotic behaviour by excluding a number of leading iterates of stochastic optimization from its calculations. In practice, with a finite number of optimization steps and a learning rate that cannot be annealed to zero, Tail Averaging can get much closer to a local minimum point of the training loss than either the individual iterates or the Poly…
▽ More
Tail Averaging improves on Polyak averaging's non-asymptotic behaviour by excluding a number of leading iterates of stochastic optimization from its calculations. In practice, with a finite number of optimization steps and a learning rate that cannot be annealed to zero, Tail Averaging can get much closer to a local minimum point of the training loss than either the individual iterates or the Polyak average. However, the number of leading iterates to ignore is an important hyperparameter, and starting averaging too early or too late leads to inefficient use of resources or suboptimal solutions. Our work focusses on improving generalization, which makes setting this hyperparameter even more difficult, especially in the presence of other hyperparameters and overfitting. Furthermore, before averaging starts, the loss is only weakly informative of the final performance, which makes early stopping unreliable. To alleviate these problems, we propose an anytime variant of Tail Averaging intended for improving generalization not pure optimization, that has no hyperparameters and approximates the optimal tail at all optimization steps. Our algorithm is based on two running averages with adaptive lengths bounded in terms of the optimal tail length, one of which achieves approximate optimality with some regularity. Requiring only the additional storage for two sets of weights and periodic evaluation of the loss, the proposed Two-Tailed Averaging algorithm is a practical and widely applicable method for improving generalization.
△ Less
Submitted 17 April, 2023; v1 submitted 26 September, 2022;
originally announced September 2022.
-
Adaptive clinical trial designs with blinded selection of binary composite endpoints and sample size reassessment
Authors:
Marta Bofill Roig,
Guadalupe Gómez Melis,
Martin Posch,
Franz Koenig
Abstract:
For randomized clinical trials where a single, primary, binary endpoint would require unfeasibly large sample sizes, composite endpoints are widely chosen as the primary endpoint. Despite being commonly used, composite endpoints entail challenges in designing and interpreting results. Given that the components may be of different relevance and have different effect sizes, the choice of components…
▽ More
For randomized clinical trials where a single, primary, binary endpoint would require unfeasibly large sample sizes, composite endpoints are widely chosen as the primary endpoint. Despite being commonly used, composite endpoints entail challenges in designing and interpreting results. Given that the components may be of different relevance and have different effect sizes, the choice of components must be made carefully. Especially, sample size calculations for composite binary endpoints depend not only on the anticipated effect sizes and event probabilities of the composite components, but also on the correlation between them. However, information on the correlation between endpoints is usually not reported in the literature which can be an obstacle for planning of future sound trial design. We consider two-arm randomized controlled trials with a primary composite binary endpoint and an endpoint that consists only of the clinically more important component of the composite endpoint. We propose a trial design that allows an adaptive modification of the primary endpoint based on blinded information obtained at an interim analysis. We consider a decision rule to select between a composite endpoint and its most relevant component as primary endpoint. The decision rule chooses the endpoint with the lower estimated required sample size. Additionally, the sample size is reassessed using the estimated event probabilities and correlation, and the expected effect sizes of the composite components. We investigate the statistical power and significance level under the proposed design through simulations. We show that the adaptive design is equally or more powerful than designs without adaptive modification on the primary endpoint. The targeted power is achieved even if the correlation is misspecified while maintaining the type 1 error. We illustrated the proposal by means of two case studies.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
Mutual Information Constraints for Monte-Carlo Objectives
Authors:
Gábor Melis,
András György,
Phil Blunsom
Abstract:
A common failure mode of density models trained as variational autoencoders is to model the data without relying on their latent variables, rendering these variables useless. Two contributing factors, the underspecification of the model and the looseness of the variational lower bound, have been studied separately in the literature. We weave these two strands of research together, specifically the…
▽ More
A common failure mode of density models trained as variational autoencoders is to model the data without relying on their latent variables, rendering these variables useless. Two contributing factors, the underspecification of the model and the looseness of the variational lower bound, have been studied separately in the literature. We weave these two strands of research together, specifically the tighter bounds of Monte-Carlo objectives and constraints on the mutual information between the observable and the latent variables. Estimating the mutual information as the average Kullback-Leibler divergence between the easily available variational posterior $q(z|x)$ and the prior does not work with Monte-Carlo objectives because $q(z|x)$ is no longer a direct approximation to the model's true posterior $p(z|x)$. Hence, we construct estimators of the Kullback-Leibler divergence of the true posterior from the prior by recycling samples used in the objective, with which we train models of continuous and discrete latents at much improved rate-distortion and no posterior collapse. While alleviated, the tradeoff between modelling the data and using the latents still remains, and we urge for evaluating inference methods across a range of mutual information values.
△ Less
Submitted 9 May, 2022; v1 submitted 1 December, 2020;
originally announced December 2020.
-
Design of phase III trials with long-term survival outcomes based on short-term binary results
Authors:
Marta Bofill Roig,
Yu Shen,
Guadalupe Gómez Melis
Abstract:
Pathologic complete response (pCR) is a common primary endpoint for a phase II trial or even accelerated approval of neoadjuvant cancer therapy. If granted, a two-arm confirmatory trial is often required to demonstrate the efficacy with a time-to-event outcome such as overall survival. However, the design of a subsequent phase III trial based on prior information on the pCR effect is not straightf…
▽ More
Pathologic complete response (pCR) is a common primary endpoint for a phase II trial or even accelerated approval of neoadjuvant cancer therapy. If granted, a two-arm confirmatory trial is often required to demonstrate the efficacy with a time-to-event outcome such as overall survival. However, the design of a subsequent phase III trial based on prior information on the pCR effect is not straightforward. Aiming at designing such phase III trials with overall survival as primary endpoint using pCR information from previous trials, we consider a mixture model that incorporates both the survival and the binary endpoints. We propose to base the comparison between arms on the difference of the restricted mean survival times, and show how the effect size and sample size for overall survival rely on the probability of the binary response and the survival distribution by response status, both for each treatment arm. Moreover, we provide the sample size calculation under different scenarios and accompany them with an R package where all the computations have been implemented. We evaluate our proposal with a simulation study, and illustrate its application through a neoadjuvant breast cancer trial.
△ Less
Submitted 24 May, 2021; v1 submitted 28 August, 2020;
originally announced August 2020.
-
A class of two-sample nonparametric statistics for binary and time-to-event outcomes
Authors:
Marta Bofill Roig,
Guadalupe Gómez Melis
Abstract:
We propose a class of two-sample statistics for testing the equality of proportions and the equality of survival functions. We build our proposal on a weighted combination of a score test for the difference in proportions and a Weighted Kaplan-Meier statistic-based test for the difference of survival functions. The proposed statistics are fully non-parametric and do not rely on the proportional ha…
▽ More
We propose a class of two-sample statistics for testing the equality of proportions and the equality of survival functions. We build our proposal on a weighted combination of a score test for the difference in proportions and a Weighted Kaplan-Meier statistic-based test for the difference of survival functions. The proposed statistics are fully non-parametric and do not rely on the proportional hazards assumption for the survival outcome. We present the asymptotic distribution of these statistics, propose a variance estimator and show their asymptotic properties under fixed and local alternatives. We discuss different choices of weights including those that control the relative relevance of each outcome and emphasize the type of difference to be detected in the survival outcome. We evaluate the performance of these statistics with a simulation study, and illustrate their use with a randomized phase III cancer vaccine trial. We have implemented the proposed statistics in the R package SurvBin, available on GitHub (https://github.com/MartaBofillRoig/SurvBin).
△ Less
Submitted 4 April, 2020; v1 submitted 4 February, 2020;
originally announced February 2020.
-
Decision tool and Sample Size Calculator for Composite Endpoints
Authors:
Marta Bofill Roig,
Jordi Cortés Martínez,
Guadalupe Gómez Melis
Abstract:
Summary points:
- This article considers the combination of two binary or two time-to-event endpoints to form the primary composite endpoint for leading a trial.
- It discusses the relative efficiency of choosing a composite endpoint over one of its components in terms of: the frequencies of observing each component; the relative treatment effect of the tested therapy; and the association betw…
▽ More
Summary points:
- This article considers the combination of two binary or two time-to-event endpoints to form the primary composite endpoint for leading a trial.
- It discusses the relative efficiency of choosing a composite endpoint over one of its components in terms of: the frequencies of observing each component; the relative treatment effect of the tested therapy; and the association between both components.
- We highlight the very important role of the association between components in choosing the most efficient endpoint to use as primary.
- For better grounded future trials, we recommend trialists to always reporting the association between components of the composite endpoint.
- Common fallacies to note when using composite endpoints: i) composite endpoints always imply higher power; ii) treatment effect on the composite endpoint is similar to the average effects of its components; and iii) the probability of observing the primary endpoint increases significantly.
△ Less
Submitted 10 January, 2020;
originally announced January 2020.
-
Non-constant hazard ratios in randomized controlled trials with composite endpoints
Authors:
Jordi Cortés Martínez,
Moisès Gómez Mateu,
KyungMann Kim,
Guadalupe Gómez Melis
Abstract:
The hazard ratio is routinely used as a summary measure to assess the treatment effect in clinical trials with time-to-event endpoints. It is frequently assumed as constant over time although this assumption often does not hold. When the hazard ratio deviates considerably from being constant, the average of its plausible values is not a valid measure of the treatment effect, can be clinically misl…
▽ More
The hazard ratio is routinely used as a summary measure to assess the treatment effect in clinical trials with time-to-event endpoints. It is frequently assumed as constant over time although this assumption often does not hold. When the hazard ratio deviates considerably from being constant, the average of its plausible values is not a valid measure of the treatment effect, can be clinically misleading and common sample size formulas are not appropriate.
In this paper, we study the hazard ratio along time of a two-component composite endpoint under the assumption that the hazard ratio for each component is constant.
This work considers two measures for quantifying the non-proportionality of the hazard ratio: the difference $D$ between the maximum and minimum values of hazard ratio over time and the relative measure $R$ representing the ratio between the sample sizes for the minimum detectable and the average effects. We illustrate $D$ and $R$ by means of the ZODIAC trial where the primary endpoint was progression-free survival.
We have run a simulation study deriving scenarios for different values of the hazard ratios, different event rates and different degrees of association between the components. We illustrate situations that yield non-constant hazard ratios for the composite endpoints and consider the likely impact on sample size.
Results show that the distance between the two component hazard ratios plays an important role, especially when they are close to 1. Furthermore, even when the treatment effects for each component are similar, if the two-component hazards are markedly different, hazard ratio of the composite is often non-constant.
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
Unsupervised Recurrent Neural Network Grammars
Authors:
Yoon Kim,
Alexander M. Rush,
Lei Yu,
Adhiguna Kuncoro,
Chris Dyer,
Gábor Melis
Abstract:
Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNG…
▽ More
Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNGs. Since directly marginalizing over the space of latent trees is intractable, we instead apply amortized variational inference. To maximize the evidence lower bound, we develop an inference network parameterized as a neural CRF constituency parser. On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese. On constituency grammar induction, they are competitive with recent neural language models that induce tree structures from words through attention mechanisms.
△ Less
Submitted 4 August, 2019; v1 submitted 7 April, 2019;
originally announced April 2019.
-
A new approach for sizing trials with composite binary endpoints using anticipated marginal values and accounting for the correlation between components
Authors:
Marta Bofill Roig,
Guadalupe Gómez Melis
Abstract:
Composite binary endpoints are increasingly used as primary endpoints in clinical trials. When designing a trial, it is crucial to determine the appropriate sample size for testing the statistical differences between treatment groups for the primary endpoint. As shown in this work, when using a composite binary endpoint to size a trial, one needs to specify the event rates and the effect sizes of…
▽ More
Composite binary endpoints are increasingly used as primary endpoints in clinical trials. When designing a trial, it is crucial to determine the appropriate sample size for testing the statistical differences between treatment groups for the primary endpoint. As shown in this work, when using a composite binary endpoint to size a trial, one needs to specify the event rates and the effect sizes of the composite components as well as the correlation between them. In practice, the marginal parameters of the components can be obtained from previous studies or pilot trials, however, the correlation is often not previously reported and thus usually unknown. We first show that the sample size for composite binary endpoints is strongly dependent on the correlation and, second, that slight deviations in the prior information on the marginal parameters may result in underpowered trials for achieving the study objectives at a pre-specified significance level. We propose a general strategy for calculating the required sample size when the correlation is not specified, and accounting for uncertainty in the marginal parameter values. We present the web platform CompARE to characterize composite endpoints and to calculate the sample size just as we propose in this paper. We evaluate the performance of the proposal with a simulation study, and illustrate it by means of a real case study using CompARE.
△ Less
Submitted 16 November, 2018; v1 submitted 3 July, 2018;
originally announced July 2018.
-
Pushing the bounds of dropout
Authors:
Gábor Melis,
Charles Blundell,
Tomáš Kočiský,
Karl Moritz Hermann,
Chris Dyer,
Phil Blunsom
Abstract:
We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that…
▽ More
We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that compute a power mean over the sampled dropout masks, and their less stochastic subvariants with tighter and higher lower bounds than the fully stochastic dropout objective. We argue that since the deterministic subvariant's bound is equal to its objective, and the highest amongst these models, the predominant view of it as a good approximation to MC averaging is misleading. Rather, deterministic dropout is the best available approximation to the true objective.
△ Less
Submitted 27 September, 2018; v1 submitted 23 May, 2018;
originally announced May 2018.