-
Modeling times to multiple events under informative censoring with C-vine copula
Authors:
Xinyuan Chen,
Yiwei Li,
Qian M. Zhou
Abstract:
The study of times to nonterminal events of different types and their interrelation is a compelling area of interest. The primary challenge in analyzing such multivariate event times is the presence of informative censoring by the terminal event. While numerous statistical methods have been proposed for a single nonterminal event, i.e., semi-competing risks data, there remains a dearth of tools fo…
▽ More
The study of times to nonterminal events of different types and their interrelation is a compelling area of interest. The primary challenge in analyzing such multivariate event times is the presence of informative censoring by the terminal event. While numerous statistical methods have been proposed for a single nonterminal event, i.e., semi-competing risks data, there remains a dearth of tools for analyzing times to multiple nonterminal events. These events involve more complex dependence structures between nonterminal and terminal events and between nonterminal events themselves. This paper introduces a novel modeling framework leveraging the vine copula to directly estimate the joint distribution of the multivariate times to nonterminal and terminal events. Unlike the few existing methods based on multivariate or nested copulas, our model excels in capturing the heterogeneous dependence between each pair of event times in terms of strength and structure. Furthermore, our model allows regression modeling for all the marginal distributions of times to nonterminal and terminal events, a feature lacking in existing methods. We propose a likelihood-based estimation and inference procedure, which can be implemented efficiently in sequential stages. Through simulation studies, we demonstrate the satisfactory finite-sample performance of our proposed stage-wise estimators and analytical variance estimators, as well as their superiority over existing methods. We apply our approach to data from a crowdfunding platform to investigate the relationship between creator-backer interactions of various types and a creator's lifetime on the platform.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Two-Stage Pseudo Maximum Likelihood Estimation of Semiparametric Copula-based Regression Models for Semi-Competing Risks Data
Authors:
Sakie J. Arachchige,
Xinyuan Chen,
Qian M. Zhou
Abstract:
We propose a two-stage estimation procedure for a copula-based model with semi-competing risks data, where the non-terminal event is subject to dependent censoring by the terminal event, and both events are subject to independent censoring. With a copula-based model, the marginal survival functions of individual event times are specified by semiparametric transformation models, and the dependence…
▽ More
We propose a two-stage estimation procedure for a copula-based model with semi-competing risks data, where the non-terminal event is subject to dependent censoring by the terminal event, and both events are subject to independent censoring. With a copula-based model, the marginal survival functions of individual event times are specified by semiparametric transformation models, and the dependence between the bivariate event times is specified by a parametric copula function. For the estimation procedure, in the first stage, the parameters associated with the marginal of the terminal event are estimated using only the corresponding observed outcomes, and in the second stage, the marginal parameters for the non-terminal event time and the copula parameter are estimated together via maximizing a pseudo-likelihood function based on the joint distribution of the bivariate event times. We derived the asymptotic properties of the proposed estimator and provided an analytic variance estimator for inference. Through simulation studies, we showed that our approach leads to consistent estimates with less computational cost and more robustness than the one-stage procedure developed in Chen (2012), where all parameters were estimated simultaneously. In addition, our approach demonstrates more desirable finite-sample performances over another existing two-stage estimation method proposed in Zhu et al. (2021). An R package PMLE4SCR is developed to implement our proposed method.
△ Less
Submitted 25 October, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Information matrix equivalence in the presence of censoring: A goodness-of-fit test for semiparametric copula models with multivariate survival data
Authors:
Qian M. Zhou
Abstract:
Various goodness-of-fit tests are designed based on the so-called information matrix equivalence: if the assumed model is correctly specified, two information matrices that are derived from the likelihood function are equivalent. In the literature, this principle has been established for the likelihood function with fully observed data, but it has not been verified under the likelihood for censore…
▽ More
Various goodness-of-fit tests are designed based on the so-called information matrix equivalence: if the assumed model is correctly specified, two information matrices that are derived from the likelihood function are equivalent. In the literature, this principle has been established for the likelihood function with fully observed data, but it has not been verified under the likelihood for censored data. In this manuscript, we prove the information matrix equivalence in the framework of semiparametric copula models for multivariate censored survival data. Based on this equivalence, we propose an information ratio (IR) test for the specification of the copula function. The IR statistic is constructed via comparing consistent estimates of the two information matrices. We derive the asymptotic distribution of the IR statistic and propose a parametric bootstrap procedure for the finite-sample $P$-value calculation. The performance of the IR test is investigated via a simulation study and a real data example.
△ Less
Submitted 23 July, 2023; v1 submitted 20 September, 2021;
originally announced September 2021.
-
A new weighting method when not all the events are selected as cases in a nested case-control study
Authors:
Qian M. Zhou,
Xuan Wang,
Yingye Zheng,
Tianxi Cai
Abstract:
Nested case-control (NCC) is a sampling method widely used for developing and evaluating risk models with expensive biomarkers on large prospective cohort studies. The biomarker values are typically obtained on a sub-cohort, consisting of all the events and a subset of non-events. However, when the number of events is not small, it might not be affordable to measure the biomarkers on all of them.…
▽ More
Nested case-control (NCC) is a sampling method widely used for developing and evaluating risk models with expensive biomarkers on large prospective cohort studies. The biomarker values are typically obtained on a sub-cohort, consisting of all the events and a subset of non-events. However, when the number of events is not small, it might not be affordable to measure the biomarkers on all of them. Due to the costs and limited availability of bio-specimens, only a subset of events is selected to the sub-cohort as cases. For these "untypical" NCC studies, we propose a new weighting method for the inverse probability weighted (IPW) estimation. We also design a perturbation method to estimate the variance of the IPW estimator with our new weights. It accounts for between-subject correlations induced by the sampling processes for both cases and controls through perturbing their sampling indicator variables, and thus, captures all the variations. Furthermore, we demonstrate, analytically and numerically, that when cases consist of only a subset of events, our new weight produces more efficient IPW estimators than the weight proposed in Samuelsen (1997) for a standard NCC design. We illustrate the estimating procedure with a study that aims to evaluate a biomarker-based risk prediction model using the Framingham cohort study.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
Is the new model better? One metric says yes, but the other says no. Which metric do I use?
Authors:
Qian M. Zhou,
Zhe Lu,
Russell J. Brooke,
Melissa M Hudson,
Yan Yuan
Abstract:
Incremental value (IncV) evaluates the performance change from an existing risk model to a new model. It is one of the key considerations in deciding whether a new risk model performs better than the existing one. Problems arise when different IncV metrics contradict each other. For example, compared with a prescribed-dose model, an ovarian-dose model for predicting acute ovarian failure has a sli…
▽ More
Incremental value (IncV) evaluates the performance change from an existing risk model to a new model. It is one of the key considerations in deciding whether a new risk model performs better than the existing one. Problems arise when different IncV metrics contradict each other. For example, compared with a prescribed-dose model, an ovarian-dose model for predicting acute ovarian failure has a slightly lower area under the receiver operating characteristic curve (AUC) but increases the area under the precision-recall curve (AP) by 48%. This phenomenon of conflicting conclusions is not uncommon, and it creates a dilemma in medical decision making. In this article, we examine the analytical connections and differences between two IncV metrics: IncV in AUC (IncV-AUC) and IncV in AP (IncV-AP). Additionally, since they are both semi-proper scoring rules, we compare them with a strictly proper scoring rule: the IncV of the scaled Brier score (IncV-sBrS), via a numerical study. We demonstrate that both IncV-AUC and IncV-AP are weighted averages of the changes (from the existing model to the new one) in separating the risk score distributions between events and non-events. However, IncV-AP assigns heavier weights to the changes in the high-risk group, whereas IncV-AUC weights the changes equally. In the numerical study, we find that IncV-AP has a wide range, from negative to positive, but the size of IncV-AUC is much smaller. In addition, IncV-AP and IncV-sBr Sare highly consistent, but IncV-AUC is negatively correlated with IncV-sBrS and IncV-AP at a low event rate. IncV-AUC and IncV-AP are the least consistent among the three pairs, and their differences are more pronounced as the event rate decreases.
△ Less
Submitted 15 December, 2020; v1 submitted 19 October, 2020;
originally announced October 2020.
-
A Threshold-free Prospective Prediction Accuracy Measure for Censored Time to Event Data
Authors:
Yan Yuan,
Qian M. Zhou,
Bingying Li,
Hengrui Cai,
Eric J. Chow,
Gregory T. Armstrong
Abstract:
Prediction performance of a risk scoring system needs to be carefully assessed before its adoption in clinical practice. Clinical preventive care often uses risk scores to screen asymptomatic population. The primary clinical interest is to predict the risk of having an event by a pre-specified future time $t_0$. Prospective accuracy measures such as positive predictive values have been recommended…
▽ More
Prediction performance of a risk scoring system needs to be carefully assessed before its adoption in clinical practice. Clinical preventive care often uses risk scores to screen asymptomatic population. The primary clinical interest is to predict the risk of having an event by a pre-specified future time $t_0$. Prospective accuracy measures such as positive predictive values have been recommended for evaluating the predictive performance. However, for commonly used continuous or ordinal risk score systems, these measures require a subjective cutoff threshold value that dichotomizes the risk scores. The need for a cut-off value created barriers for practitioners and researchers. In this paper, we propose a threshold-free summary index of positive predictive values that accommodates time-dependent event status. We develop a nonparametric estimator and provide an inference procedure for comparing this summary measure between competing risk scores for censored time to event data. We conduct a simulation study to examine the finite-sample performance of the proposed estimation and inference procedures. Lastly, we illustrate the use of this measure on a real data example, comparing two risk score systems for predicting heart failure in childhood cancer survivors.
△ Less
Submitted 14 September, 2016; v1 submitted 13 June, 2016;
originally announced June 2016.