Comparative Review of Modern Competing Risk Methods in High-dimensional Settings
Authors:
Paul M. Djangang,
Summer S. Han,
Nilotpal Sanyal
Abstract:
Competing risk analysis accounts for multiple mutually exclusive events, improving risk estimation over traditional survival analysis. Despite methodological advancements, a comprehensive comparison of competing risk methods, especially in high-dimensional settings, remains limited. This study evaluates penalized regression (LASSO, SCAD, MCP), boosting (CoxBoost, CB), random forest (RF), and deep…
▽ More
Competing risk analysis accounts for multiple mutually exclusive events, improving risk estimation over traditional survival analysis. Despite methodological advancements, a comprehensive comparison of competing risk methods, especially in high-dimensional settings, remains limited. This study evaluates penalized regression (LASSO, SCAD, MCP), boosting (CoxBoost, CB), random forest (RF), and deep learning (DeepHit, DH) methods for competing risk analysis through extensive simulations, assessing variable selection, estimation accuracy, discrimination, and calibration under diverse data conditions. Our results show that CB achieves the best variable selection, estimation stability, and discriminative ability, particularly in high-dimensional settings. while MCP and SCAD provide superior calibration in $n>p$ scenarios. RF and DH capture nonlinear effects but exhibit instability, with RF showing high false discovery rates and DH suffering from poor calibration. Further, we compare the flexibility of these methods through the analysis of a melanoma gene expression data with survival information. This study provides practical guidelines for selecting competing risk models to ensure robust and interpretable analysis in high-dimensional settings and outlines important directions for future research.
△ Less
Submitted 2 April, 2025; v1 submitted 17 March, 2025;
originally announced March 2025.
Uncovering shared common genetic risk factors for various aspects of complex disorders captured in multiple traits
Authors:
Summer S. Han,
Elena L. Grigorenko,
Joseph T. Chang
Abstract:
Identifying shared genetic risk factors for multiple measured traits has been of great interest in studying complex disorders. Marlow's (2003) method for detecting shared gene effects on complex traits has been highly influential in the literature of neurodevelopmental disorders as well as other disorders including obesity and asthma. Although its method has been widely applied and has been reco…
▽ More
Identifying shared genetic risk factors for multiple measured traits has been of great interest in studying complex disorders. Marlow's (2003) method for detecting shared gene effects on complex traits has been highly influential in the literature of neurodevelopmental disorders as well as other disorders including obesity and asthma. Although its method has been widely applied and has been recommended as potentially powerful, the validity and power of this method have not been examined either theoretically or by simulation. This paper establishes the validity and quantifies and explains the power of the method. We show the method has correct type 1 error rates regardless of the number of traits in the model, and confirm power increases compared to standard univariate methods across different genetic models. We discover the main source of these power gains is correlations among traits induced by a common major gene effect component. We compare the use of the complete pleiotropy model, as assumed by Marlow, to the use of a more general model allowing additional correlation parameters, and find that even when the true model includes those parameters, the complete pleiotropy model is more powerful as long as traits are moderately correlated by a major gene component. We implement this method and a power calculator in software that can assist in designing studies by using pilot data to calculate required sample sizes and choose traits for further linkage studies. We apply the software to data on reading disability in the Russian language.
△ Less
Submitted 14 April, 2009;
originally announced April 2009.
Reconsidering the asymptotic null distribution of likelihood ratio tests for genetic linkage in multivariate variance components models
Authors:
Summer S. Han,
Joseph T. Chang
Abstract:
Accurate knowledge of the null distribution of hypothesis tests is important for valid application of the tests. In previous papers and software, the asymptotic null distribution of likelihood ratio tests for detecting genetic linkage in multivariate variance components models has been stated to be a mixture of chi-square distributions with binomial mixing probabilities. Here we show, by simulat…
▽ More
Accurate knowledge of the null distribution of hypothesis tests is important for valid application of the tests. In previous papers and software, the asymptotic null distribution of likelihood ratio tests for detecting genetic linkage in multivariate variance components models has been stated to be a mixture of chi-square distributions with binomial mixing probabilities. Here we show, by simulation and by theoretical arguments based on the geometry of the parameter space, that all aspects of the previously stated asymptotic null distribution are incorrect--both the binomial mixing probabilities and the chi-square components. Correcting the null distribution gives more conservative critical values than previously stated, yielding P values that can easily be ten times larger. The true mixing probabilities give the highest probability to the case where all variance parameters are estimated positive, and the mixing components show severe departures from chi-square distributions. Thus, the asymptotic null distribution has complex features that raise challenges for the assessment of significance of multivariate linkage findings. We propose a method to generate an asymptotic null distribution that is much faster than other empirical methods such as gene-dropping, enabling us to obtain P values with higher precision more efficiently.
△ Less
Submitted 13 September, 2008; v1 submitted 14 August, 2008;
originally announced August 2008.