Identification of Outlying Observations with Quantile Regression for Censored Data
Authors:
Soo-Heang Eo,
Seung-Mo Hong,
HyungJun Cho
Abstract:
Outlying observations, which significantly deviate from other measurements, may distort the conclusions of data analysis. Therefore, identifying outliers is one of the important problems that should be solved to obtain reliable results. While there are many statistical outlier detection algorithms and software programs for uncensored data, few are available for censored data. In this article, we p…
▽ More
Outlying observations, which significantly deviate from other measurements, may distort the conclusions of data analysis. Therefore, identifying outliers is one of the important problems that should be solved to obtain reliable results. While there are many statistical outlier detection algorithms and software programs for uncensored data, few are available for censored data. In this article, we propose three outlier detection algorithms based on censored quantile regression, two of which are modified versions of existing algorithms for uncensored or censored data, while the third is a newly developed algorithm to overcome the demerits of previous approaches. The performance of the three algorithms was investigated in simulation studies. In addition, real data from SEER database, which contains a variety of data sets related to various cancers, is illustrated to show the usefulness of our methodology. The algorithms are implemented into an R package OutlierDC which can be conveniently employed in the \proglang{R} environment and freely obtained from CRAN.
△ Less
Submitted 30 April, 2014;
originally announced April 2014.
K-Adaptive Partitioning for Survival Data, with an Application to Cancer Staging
Authors:
Soo-Heang Eo,
Hyo Jeong Kang,
Seung-Mo Hong,
HyungJun Cho
Abstract:
In medical research, it is often needed to obtain subgroups with heterogeneous survivals, which have been predicted from a prognostic factor. For this purpose, a binary split has often been used once or recursively; however, binary partitioning may not provide an optimal set of well separated subgroups. We propose a multi-way partitioning algorithm, which divides the data into K heterogeneous subg…
▽ More
In medical research, it is often needed to obtain subgroups with heterogeneous survivals, which have been predicted from a prognostic factor. For this purpose, a binary split has often been used once or recursively; however, binary partitioning may not provide an optimal set of well separated subgroups. We propose a multi-way partitioning algorithm, which divides the data into K heterogeneous subgroups based on the information from a prognostic factor. The resulting subgroups show significant differences in survival. Such a multi-way partition is found by maximizing the minimum of the subgroup pairwise test statistics. An optimal number of subgroups is determined by a permutation test. Our developed algorithm is compared with two binary recursive partitioning algorithms. In addition, its usefulness is demonstrated with a real data of colorectal cancer cases from the Surveillance Epidemiology and End Results program. We have implemented our algorithm into an R package maps, which is freely available in the Comprehensive R Archive Network (CRAN).
△ Less
Submitted 1 November, 2014; v1 submitted 19 June, 2013;
originally announced June 2013.