-
Simulating Complex Crossectional and Longitudinal Data using the simDAG R Package
Authors:
Robin Denz,
Nina Timmesfeld
Abstract:
Generating artificial data is a crucial step when performing Monte-Carlo simulation studies. Depending on the planned study, complex data generation processes (DGP) containing multiple, possibly time-varying, variables with various forms of dependencies and data types may be required. Simulating data from such DGP may therefore become a difficult and time-consuming endeavor. The simDAG R package o…
▽ More
Generating artificial data is a crucial step when performing Monte-Carlo simulation studies. Depending on the planned study, complex data generation processes (DGP) containing multiple, possibly time-varying, variables with various forms of dependencies and data types may be required. Simulating data from such DGP may therefore become a difficult and time-consuming endeavor. The simDAG R package offers a standardized approach to generate data from simple and complex DGP based on the definition of structural equations in directed acyclic graphs using arbitrary functions or regression models. The package offers a clear syntax with an enhanced formula interface and directly supports generating binary, categorical, count and time-to-event data with arbitrary dependencies, possibly non-linear relationships and interactions. It additionally includes a framework to conduct discrete-time based simulations which allows the generation of longitudinal data on a semi-continuous time-scale. This approach may be used to generate time-to-event data with both recurrent or competing events and possibly multiple time-varying covariates, which may themselves have arbitrary data types. In this article we demonstrate the vast amount of features included in simDAG by replicating the DGP of multiple real Monte-Carlo simulation studies.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
adjustedCurves: Estimating Confounder-Adjusted Survival Curves in R
Authors:
Robin Denz,
Nina Timmesfeld
Abstract:
Kaplan-Meier curves stratified by treatment allocation are the most popular way to depict causal effects in studies with right-censored time-to-event endpoints. If the treatment is randomly assigned and the sample size of the study is adequate, this method produces unbiased estimates of the population-averaged counterfactual survival curves. However, in the presence of confounding, this is no long…
▽ More
Kaplan-Meier curves stratified by treatment allocation are the most popular way to depict causal effects in studies with right-censored time-to-event endpoints. If the treatment is randomly assigned and the sample size of the study is adequate, this method produces unbiased estimates of the population-averaged counterfactual survival curves. However, in the presence of confounding, this is no longer the case. Instead, specific methods that allow adjustment for confounding must be used. We present the adjustedCurves R package, which can be used to estimate and plot these confounder-adjusted survival curves using a variety of methods from the literature. It provides a convenient wrapper around existing R packages on the topic and adds additional methods and functionality on top of it, uniting the sometimes vastly different methods under one consistent framework. Among the additional features are the estimation of confidence intervals, confounder-adjusted restricted mean survival times and confounder-adjusted survival time quantiles. After giving a brief overview of the implemented methods, we illustrate the package using publicly available data from an observational study including 2982 breast cancer.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Impact of Record-Linkage Errors in Covid-19 Vaccine-Safety Analyses using German Health-Care Data: A Simulation Study
Authors:
Robin Denz,
Katharina Meiszl,
Peter Ihle,
Doris Oberle,
Ursula Drechsel-Bäuerle,
Katrin Scholz,
Ingo Meyer,
Nina Timmesfeld
Abstract:
With unprecedented speed, 192,248,678 doses of Covid-19 vaccines were administered in Germany by July 11, 2023 to combat the pandemic. Limitations of clinical trials imply that the safety profile of these vaccines is not fully known before marketing. However, routine health-care data can help address these issues. Despite the high proportion of insured people, the analysis of vaccination-related d…
▽ More
With unprecedented speed, 192,248,678 doses of Covid-19 vaccines were administered in Germany by July 11, 2023 to combat the pandemic. Limitations of clinical trials imply that the safety profile of these vaccines is not fully known before marketing. However, routine health-care data can help address these issues. Despite the high proportion of insured people, the analysis of vaccination-related data is challenging in Germany. Generally, the Covid-19 vaccination status and other health-care data are stored in separate databases, without persistent and database-independent person identifiers. Error-prone record-linkage techniques must be used to merge these databases. Our aim was to quantify the impact of record-linkage errors on the power and bias of different analysis methods designed to assess Covid-19 vaccine safety when using German health-care data with a Monte-Carlo simulation study. We used a discrete-time simulation and empirical data to generate realistic data with varying amounts of record-linkage errors. Afterwards, we analysed this data using a Cox model and the self-controlled case series (SCCS) method. Realistic proportions of random linkage errors only had little effect on the power of either method. The SCCS method produced unbiased results even with a high percentage of linkage errors, while the Cox model underestimated the true effect.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Bias through time-varying covariates in the analysis of cohort stepped wedge trials: a simulation study
Authors:
Jale Basten,
Katja Ickstadt,
Nina Timmesfeld
Abstract:
In stepped wedge cluster randomized trials (SW-CRTs), observations collected under the control condition are, on average, from an earlier time than observations collected under the intervention condition. In a cohort design, participants are followed up throughout the study, so correlations between measurements within a participant are dependent of the timing in which the observations are made. Th…
▽ More
In stepped wedge cluster randomized trials (SW-CRTs), observations collected under the control condition are, on average, from an earlier time than observations collected under the intervention condition. In a cohort design, participants are followed up throughout the study, so correlations between measurements within a participant are dependent of the timing in which the observations are made. Therefore, changes in participants' characteristics over time must be taken into account when estimating intervention effects. For example, participants' age progresses, which may impact the outcome over the study period. Motivated by an SW-CRT of a geriatric care intervention to improve quality of life, we conducted a simulation study to compare model formulations analysing data from an SW-CRT under different scenarios in which time was related to the covariates and the outcome. The aim was to find a model specification that produces reliable estimates of the intervention effect. Six linear mixed effects (LME) models with different specification of fixed effects were fitted. Across 1000 simulations per parameter combination, we computed mean and standard error of the estimated intervention effects. We found that LME models with fixed categorical time effects additional to the fixed intervention effect and two random effects used to account for clustering (within-cluster correlation) and multiple measurements on participants (within-individual correlation) seem to produce unbiased estimates of the intervention effect even if time-varying confounders or their functional influence on outcome were unknown or unmeasured and if secular time trends occurred. Therefore, including (time-varying) covariates describing the study cohort seems to be avoidable.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Visualizing the (Causal) Effect of a Continuous Variable on a Time-To-Event Outcome
Authors:
Robin Denz,
Nina Timmesfeld
Abstract:
Visualization is a key aspect of communicating the results of any study aiming to estimate causal effects. In studies with time-to-event outcomes, the most popular visualization approach is depicting survival curves stratified by the variable of interest. This approach cannot be used when the variable of interest is continuous. Simple workarounds, such as categorizing the continuous covariate and…
▽ More
Visualization is a key aspect of communicating the results of any study aiming to estimate causal effects. In studies with time-to-event outcomes, the most popular visualization approach is depicting survival curves stratified by the variable of interest. This approach cannot be used when the variable of interest is continuous. Simple workarounds, such as categorizing the continuous covariate and plotting survival curves for each category, can result in misleading depictions of the main effects. Instead, we propose a new graphic, the survival area plot, to directly depict the survival probability over time and as a function of a continuous covariate simultaneously. This plot utilizes g-computation based on a suitable time-to-event model to obtain the relevant estimates. Through the use of g-computation, those estimates can be adjusted for confounding without additional effort, allowing a causal interpretation under the standard causal identifiability assumptions. If those assumptions are not met, the proposed plot may still be used to depict noncausal associations. We illustrate and compare the proposed graphics to simpler alternatives using data from a large German observational study investigating the effect of the Ankle Brachial Index on survival. To facilitate the usage of these plots, we additionally developed the contsurvplot R-package which includes all methods discussed in this paper.
△ Less
Submitted 6 March, 2023; v1 submitted 9 August, 2022;
originally announced August 2022.
-
A Comparison of Different Methods to Adjust Survival Curves for Confounders
Authors:
Robin Denz,
Renate Klaaßen-Mielke,
Nina Timmesfeld
Abstract:
Treatment specific survival curves are an important tool to illustrate the treatment effect in studies with time-to-event outcomes. In non-randomized studies, unadjusted estimates can lead to biased depictions due to confounding. Multiple methods to adjust survival curves for confounders exist. However, it is currently unclear which method is the most appropriate in which situation. Our goal is to…
▽ More
Treatment specific survival curves are an important tool to illustrate the treatment effect in studies with time-to-event outcomes. In non-randomized studies, unadjusted estimates can lead to biased depictions due to confounding. Multiple methods to adjust survival curves for confounders exist. However, it is currently unclear which method is the most appropriate in which situation. Our goal is to compare forms of Inverse Probability of Treatment Weighting, the G-Formula, Propensity Score Matching, Empirical Likelihood Estimation and augmented estimators as well as their pseudo-values based counterparts in different scenarios with a focus on their bias and goodness-of-fit. We provide a short review of all methods and illustrate their usage by contrasting the survival of smokers and non-smokers, using data from the German Epidemiological Trial on Ankle-Brachial-Index. Subsequently, we compare the methods using a Monte-Carlo simulation. We consider scenarios in which correctly or incorrectly specified models for describing the treatment assignment and the time-to-event outcome are used with varying sample sizes. The bias and goodness-of-fit is determined by taking the entire survival curve into account. When used properly, all methods showed no systematic bias in medium to large samples. Cox regression based methods, however, showed systematic bias in small samples. The goodness-of-fit varied greatly between different methods and scenarios. Methods utilizing an outcome model were more efficient than other techniques, while augmented estimators using an additional treatment assignment model were unbiased when either model was correct with a goodness-of-fit comparable to other methods. These doubly-robust methods have important advantages in every considered scenario.
△ Less
Submitted 15 November, 2022; v1 submitted 18 March, 2022;
originally announced March 2022.