-
Survival Data Simulation With the R Package rsurv
Authors:
Fábio N. Demarqui
Abstract:
In this paper we propose a novel R package, called rsurv, developed for general survival data simulation purposes. The package is built under a new approach to simulate survival data that depends heavily on the use of dplyr verbs. The proposed package allows simulations of survival data from a wide range of regression models, including accelerated failure time (AFT), proportional hazards (PH), pro…
▽ More
In this paper we propose a novel R package, called rsurv, developed for general survival data simulation purposes. The package is built under a new approach to simulate survival data that depends heavily on the use of dplyr verbs. The proposed package allows simulations of survival data from a wide range of regression models, including accelerated failure time (AFT), proportional hazards (PH), proportional odds (PO), accelerated hazard (AH), Yang and Prentice (YP), and extended hazard (EH) models. The package rsurv also stands out by its ability to generate survival data from an unlimited number of baseline distributions provided that an implementation of the quantile function of the chosen baseline distribution is available in R. Another nice feature of the package rsurv lies in the fact that linear predictors are specified using R formulas, facilitating the inclusion of categorical variables, interaction terms and offset variables. The functions implemented in the package rsurv can also be employed to simulate survival data with more complex structures, such as survival data with different types of censoring mechanisms, survival data with cure fraction, survival data with random effects (frailties), multivarite survival data, and competing risks survival data.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
The Analysis of Criminal Recidivism: A Hierarchical Model-Based Approach for the Analysis of Zero-Inflated, Spatially Correlated recurrent events Data
Authors:
Alisson C. C. Silva,
Fábio N. Demarqui,
Bráulio F. Silva,
Marcos O. Prates
Abstract:
The life course perspective in criminology has become prominent last years, offering valuable insights into various patterns of criminal offending and pathways. The study of criminal trajectories aims to understand the beginning, persistence and desistence in crime, providing intriguing explanations about these moments in life. Central to this analysis is the identification of patterns in the freq…
▽ More
The life course perspective in criminology has become prominent last years, offering valuable insights into various patterns of criminal offending and pathways. The study of criminal trajectories aims to understand the beginning, persistence and desistence in crime, providing intriguing explanations about these moments in life. Central to this analysis is the identification of patterns in the frequency of criminal victimization and recidivism, along with the factors that contribute to them. Specifically, this work introduces a new class of models that overcome limitations in traditional methods used to analyze criminal recidivism. These models are designed for recurrent events data characterized by excess of zeros and spatial correlation. They extend the Non-Homogeneous Poisson Process, incorporating spatial dependence in the model through random effects, enabling the analysis of associations among individuals within the same spatial stratum. To deal with the excess of zeros in the data, a zero-inflated Poisson mixed model was incorporated. In addition to parametric models following the Power Law process for baseline intensity functions, we propose flexible semi-parametric versions approximating the intensity function using Bernstein Polynomials. The Bayesian approach offers advantages such as incorporating external evidence and modeling specific correlations between random effects and observed data. The performance of these models was evaluated in a simulation study with various scenarios, and we applied them to analyze criminal recidivism data in the Metropolitan Region of Belo Horizonte, Brazil. The results provide a detailed analysis of high-risk areas for recurrent crimes and the behavior of recidivism rates over time. This research significantly enhances our understanding of criminal trajectories, paving the way for more effective strategies in combating criminal recidivism.
△ Less
Submitted 29 August, 2024; v1 submitted 4 May, 2024;
originally announced May 2024.
-
A Class of Semiparametric Yang and Prentice Frailty Models
Authors:
Cassius Henrique Xavier Oliveira,
Fabio Nogueira Demarqui,
Vinicius Diniz Mayrink
Abstract:
The Yang and Prentice (YP) regression models have garnered interest from the scientific community due to their ability to analyze data whose survival curves exhibit intersection. These models include proportional hazards (PH) and proportional odds (PO) models as specific cases. However, they encounter limitations when dealing with multivariate survival data due to potential dependencies between th…
▽ More
The Yang and Prentice (YP) regression models have garnered interest from the scientific community due to their ability to analyze data whose survival curves exhibit intersection. These models include proportional hazards (PH) and proportional odds (PO) models as specific cases. However, they encounter limitations when dealing with multivariate survival data due to potential dependencies between the times-to-event. A solution is introducing a frailty term into the hazard functions, making it possible for the times-to-event to be considered independent, given the frailty term. In this study, we propose a new class of YP models that incorporate frailty. We use the exponential distribution, the piecewise exponential distribution (PE), and Bernstein polynomials (BP) as baseline functions. Our approach adopts a Bayesian methodology. The proposed models are evaluated through a simulation study, which shows that the YP frailty models with BP and PE baselines perform similarly to the generator parametric model of the data. We apply the models in two real data sets.
△ Less
Submitted 14 November, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Semiparametric Modeling for Multivariate Survival Data via Copulas
Authors:
W. D. R. Miranda Filho,
F. N. Demarqui
Abstract:
We propose a new class of multivariate survival models based on archimedean copulas with margins modeled by the Yang and Prentice (YP) model. The Ali-Mikhail-Haq (AMH), Clayton, Frank, Gumbel-Hougaard (GH), and Joe copulas are employed to accommodate the dependency among marginal distributions. Baseline distributions are modeled semiparametrically by the piecewise exponential (PE) distribution and…
▽ More
We propose a new class of multivariate survival models based on archimedean copulas with margins modeled by the Yang and Prentice (YP) model. The Ali-Mikhail-Haq (AMH), Clayton, Frank, Gumbel-Hougaard (GH), and Joe copulas are employed to accommodate the dependency among marginal distributions. Baseline distributions are modeled semiparametrically by the piecewise exponential (PE) distribution and the Bernstein polynomials. The new class of models possesses some attractive features: i) the ability to take into account survival data with crossing survival curves; ii) the inclusion of the well-known proportional hazards (PH) and proportional odds (PO) models as particular cases; iii) greater flexibility provided by the semiparametric modeling of the marginal baseline distributions; iv) the availability of closed-form expressions for the likelihood functions, leading to more straightforward inferential procedures. We conducted an extensive Monte Carlo simulation study to evaluate the performance of the proposed model. Finally, we demonstrate the versatility of our new class of models through the analysis of survival data involving patients diagnosed with ovarian cancer.
△ Less
Submitted 7 March, 2022;
originally announced March 2022.
-
Product Partition Dynamic Generalized Linear Models
Authors:
Victor S. Comitti,
Fábio N. Demarqui,
Thiago R. dos Santos,
Jéssica da Assunção Almeida
Abstract:
Detection and modeling of change-points in time-series can be considerably challenging. In this paper we approach this problem by incorporating the class of Dynamic Generalized Linear Models (DGLM) into the well know class of Product Partition Models (PPM). This new methodology, that we call DGLM-PPM, extends the PPM to distributions within the Exponential Family while also retaining the flexibili…
▽ More
Detection and modeling of change-points in time-series can be considerably challenging. In this paper we approach this problem by incorporating the class of Dynamic Generalized Linear Models (DGLM) into the well know class of Product Partition Models (PPM). This new methodology, that we call DGLM-PPM, extends the PPM to distributions within the Exponential Family while also retaining the flexibility of the DGLM class. It also provides a framework for Bayesian multiple change-point detection in dynamic regression models. Inference on the DGLM-PPM follow the steps of evolution and updating of the DGLM class. A Gibbs Sampler scheme with an Adaptive Rejection Metropolis Sampling (ARMS) step appended is used to compute posterior estimates of the relevant quantities. A simulation study shows that the proposed model provides reasonable estimates of the dynamic parameters and also assigns high change-point probabilities to the breaks introduced in the artificial data generated for this work. We also present a real life data example that highlights the superiority of the DGLM-PPM over the conventional DGLM in both in-sample and out-of-sample goodness of fit measures.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
pexm: a JAGS module for applications involving the piecewise exponential distribution
Authors:
Vinícius D. Mayrink,
João Daniel N. Duarte,
Fábio N. Demarqui
Abstract:
In this study, we present a new module built for users interested in a programming language similar to BUGS to fit a Bayesian model based on the piecewise exponential (PE) distribution. The module is an extension to the open-source program JAGS by which a Gibbs sampler can be applied without requiring the derivation of complete conditionals and the subsequent implementation of strategies to draw s…
▽ More
In this study, we present a new module built for users interested in a programming language similar to BUGS to fit a Bayesian model based on the piecewise exponential (PE) distribution. The module is an extension to the open-source program JAGS by which a Gibbs sampler can be applied without requiring the derivation of complete conditionals and the subsequent implementation of strategies to draw samples from unknown distributions. The PE distribution is widely used in the fields of survival analysis and reliability. Currently, it can only be implemented in JAGS through methods to indirectly specify the likelihood based on the Poisson or Bernoulli probabilities. Our module provides a more straightforward implementation and is thus more attractive to the researchers aiming to spend more time exploring the results from the Bayesian inference rather than implementing the Markov Chain Monte Carlo (MCMC) algorithm. For those interested in extending JAGS, this work can be seen as a tutorial including important information not well investigated or organized in other materials. Here, we describe how to use the module taking advantage of the interface between R and JAGS. A short simulation study is developed to ensure that the module behaves well and a real illustration, involving two PE models, exhibits a context where the module can be used in practice.
△ Less
Submitted 26 April, 2020;
originally announced April 2020.
-
An Unified Semiparametric Approach to Model Lifetime Data with Crossing Survival Curves
Authors:
Fabio N. Demarqui,
Vinicius D. Mayrink,
Sujit K. Ghosh
Abstract:
The proportional hazards (PH), proportional odds (PO) and accelerated failure time (AFT) models have been widely used in different applications of survival analysis. Despite their popularity, these models are not suitable to handle lifetime data with crossing survival curves. In 2005, Yang and Prentice proposed a semiparametric two-sample strategy (YP model), including the PH and PO frameworks as…
▽ More
The proportional hazards (PH), proportional odds (PO) and accelerated failure time (AFT) models have been widely used in different applications of survival analysis. Despite their popularity, these models are not suitable to handle lifetime data with crossing survival curves. In 2005, Yang and Prentice proposed a semiparametric two-sample strategy (YP model), including the PH and PO frameworks as particular cases, to deal with this type of data. Assuming a general regression setting, the present paper proposes an unified approach to fit the YP model by employing Bernstein polynomials to manage the baseline hazard and odds under both the frequentist and Bayesian frameworks. The use of the Bernstein polynomials has some advantages: it allows for uniform approximation of the baseline distribution, it leads to closed-form expressions for all baseline functions, it simplifies the inference procedure, and the presence of a continuous survival function allows a more accurate estimation of the crossing survival time. Extensive simulation studies are carried out to evaluate the behavior of the models. The analysis of a clinical trial data set, related to non-small-cell lung cancer, is also developed as an illustration. Our findings indicate that assuming the usual PH model, ignoring the existing crossing survival feature in the real data, is a serious mistake with implications for those patients in the initial stage of treatment.
△ Less
Submitted 10 October, 2019;
originally announced October 2019.
-
A fully likelihood-based approach to model survival data with crossing survival curves
Authors:
Fabio N. Demarqui,
Vinicius D. Mayrink
Abstract:
Proportional hazards (PH), proportional odds (PO) and accelerated failure time (AFT) models have been widely used to deal with survival data in different fields of knowledge. Despite their popularity, such models are not suitable to handle survival data with crossing survival curves. Yang and Prentice (2005) proposed a semiparametric two-sample approach, denoted here as the YP model, allowing the…
▽ More
Proportional hazards (PH), proportional odds (PO) and accelerated failure time (AFT) models have been widely used to deal with survival data in different fields of knowledge. Despite their popularity, such models are not suitable to handle survival data with crossing survival curves. Yang and Prentice (2005) proposed a semiparametric two-sample approach, denoted here as the YP model, allowing the analysis of crossing survival curves and including the PH and PO configurations as particular cases. In a general regression setting, the present work proposes a fully likelihood-based approach to fit the YP model. The main idea is to model the baseline hazard via the piecewise exponential (PE) distribution. The approach shares the flexibility of the semiparametric models and the tractability of the parametric representations. An extensive simulation study is developed to evaluate the performance of the proposed model. In addition, we demonstrate how useful is the new method through the analysis of survival times related to patients enrolled in a cancer clinical trial. The simulation results indicate that our model performs well for moderate sample sizes in the general regression setting. A superior performance is also observed with respect to the original YP model designed for the two-sample scenario.
△ Less
Submitted 6 October, 2019;
originally announced October 2019.
-
Modeling the Association Structure in Doubly Robust GEE for Longitudinal Ordinal Missing Data
Authors:
José Luiz P. da Silva,
Enrico A. Colosimo,
Fábio N. Demarqui
Abstract:
Generalized Estimation Equations (GEE) are a well-known method for the analysis of categorical longitudinal responses. GEE method has computational simplicity and population parameter interpretation. In the presence of missing data it is only valid under the strong assumption of missing completely at random. A doubly robust estimator (DRGEE) for correlated ordinal longitudinal data is a nice appro…
▽ More
Generalized Estimation Equations (GEE) are a well-known method for the analysis of categorical longitudinal responses. GEE method has computational simplicity and population parameter interpretation. In the presence of missing data it is only valid under the strong assumption of missing completely at random. A doubly robust estimator (DRGEE) for correlated ordinal longitudinal data is a nice approach for handling intermittently missing response and covariate under the MAR mechanism. Independent working correlation is the standard way in DRGEE. However, when covariate is not time stationary, efficiency can be gained using a structured association. The goal of this paper is to extend the DRGEE estimator to allow modeling the association structure by means of either the correlation coefficient or local odds ratio. Simulation results revealed better performance of the local odds ratio parametrization, specially for small samples. The method is applied to a data set related to Rheumatic Mitral Stenosis.
△ Less
Submitted 14 June, 2015;
originally announced June 2015.
-
Doubly Robust-Based Generalized Estimating Equations for the Analysis of Longitudinal Ordinal Missing Data
Authors:
José Luiz P. da Silva,
Enrico A. Colosimo,
Fábio N. Demarqui
Abstract:
Generalized Estimation Equations (GEE) are a well-known method for the analysis of non-Gaussian longitudinal data. This method has computational simplicity and marginal parameter interpretation. However, in the presence of missing data, it is only valid under the strong assumption of missing completely at random (MCAR). Some corrections can be done when the missing data mechanism is missing at ran…
▽ More
Generalized Estimation Equations (GEE) are a well-known method for the analysis of non-Gaussian longitudinal data. This method has computational simplicity and marginal parameter interpretation. However, in the presence of missing data, it is only valid under the strong assumption of missing completely at random (MCAR). Some corrections can be done when the missing data mechanism is missing at random (MAR): inverse probability weighting (WGEE) and multiple imputation (MIGEE). In order to obtain consistent estimates, it is necessary the correct specification of the weight model for WGEE or the imputation model for the MIGEE. A recent method combining ideas of these two approaches has doubly robust property. For consistency, it requires only the weight or the imputation model to be correct. In this work it is assumed a proportional odds model and it is proposed a doubly robust estimator for the analysis of ordinal longitudinal data with intermittently missing response and covariate under the MAR mechanism. Simulation results revealed better performance of the proposed method compared to WGEE and MIGEE. The method is applied to a data set related to Analgesia Pain in Childbirth study.
△ Less
Submitted 14 June, 2015;
originally announced June 2015.