-
New Insight of Spatial Scan Statistics via Regression Model
Authors:
Takayuki Kawashima,
Daisuke Yoneoka,
Yuta Tanoue,
Akifumi Eguchi,
Shuhei Nomura
Abstract:
The spatial scan statistic is widely used to detect disease clusters in epidemiological surveillance. Since the seminal work by~\cite{kulldorff1997}, numerous extensions have emerged, including methods for defining scan regions, detecting multiple clusters, and expanding statistical models. Notably,~\cite{jung2009} and~\cite{ZHANG20092851} introduced a regression-based approach accounting for cova…
▽ More
The spatial scan statistic is widely used to detect disease clusters in epidemiological surveillance. Since the seminal work by~\cite{kulldorff1997}, numerous extensions have emerged, including methods for defining scan regions, detecting multiple clusters, and expanding statistical models. Notably,~\cite{jung2009} and~\cite{ZHANG20092851} introduced a regression-based approach accounting for covariates, encompassing classical methods such as those of~\cite{kulldorff1997}. Another key extension is the expectation-based approach~\citep{neill2005anomalous,neillphdthesis}, which differs from the population-based approach represented by~\cite{kulldorff1997} in terms of hypothesis testing. In this paper, we bridge the regression-based approach with both expectation-based and population-based approaches. We reveal that the two approaches are separated by a simple difference: the presence or absence of an intercept term in the regression model. Exploiting the above simple difference, we propose new spatial scan statistics under the Gaussian and Bernoulli models. We further extend the regression-based approach by incorporating the well-known sparse L0 penalty and show that the derivation of spatial scan statistics can be expressed as an equivalent optimization problem. Our extended framework accommodates extensions such as space-time scan statistics and detecting multiple clusters while naturally connecting with existing spatial regression-based cluster detection. Considering the relation to case-specific models~\citep{she2011,10.1214/11-STS377}, clusters detected by spatial scan statistics can be viewed as outliers in terms of robust statistics. Numerical experiments with real data illustrate the behavior of our proposed statistics under specified settings.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
An Exact Solution Path Algorithm for SLOPE and Quasi-Spherical OSCAR
Authors:
Shunichi Nomura
Abstract:
Sorted $L_1$ penalization estimator (SLOPE) is a regularization technique for sorted absolute coefficients in high-dimensional regression. By arbitrarily setting its regularization weights $λ$ under the monotonicity constraint, SLOPE can have various feature selection and clustering properties. On weight tuning, the selected features and their clusters are very sensitive to the tuning parameters.…
▽ More
Sorted $L_1$ penalization estimator (SLOPE) is a regularization technique for sorted absolute coefficients in high-dimensional regression. By arbitrarily setting its regularization weights $λ$ under the monotonicity constraint, SLOPE can have various feature selection and clustering properties. On weight tuning, the selected features and their clusters are very sensitive to the tuning parameters. Moreover, the exhaustive tracking of their changes is difficult using grid search methods. This study presents a solution path algorithm that provides the complete and exact path of solutions for SLOPE in fine-tuning regularization weights. A simple optimality condition for SLOPE is derived and used to specify the next splitting point of the solution path. This study also proposes a new design of a regularization sequence $λ$ for feature clustering, which is called the quasi-spherical and octagonal shrinkage and clustering algorithm for regression (QS-OSCAR). QS-OSCAR is designed with a contour surface of the regularization terms most similar to a sphere. Among several regularization sequence designs, sparsity and clustering performance are compared through simulation studies. The numerical observations show that QS-OSCAR performs feature clustering more efficiently than other designs.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Efficient Path Algorithms for Clustered Lasso and OSCAR
Authors:
Atsumori Takahashi,
Shunichi Nomura
Abstract:
In high dimensional regression, feature clustering by their effects on outcomes is often as important as feature selection. For that purpose, clustered Lasso and octagonal shrinkage and clustering algorithm for regression (OSCAR) are used to make feature groups automatically by pairwise $L_1$ norm and pairwise $L_\infty$ norm, respectively. This paper proposes efficient path algorithms for cluster…
▽ More
In high dimensional regression, feature clustering by their effects on outcomes is often as important as feature selection. For that purpose, clustered Lasso and octagonal shrinkage and clustering algorithm for regression (OSCAR) are used to make feature groups automatically by pairwise $L_1$ norm and pairwise $L_\infty$ norm, respectively. This paper proposes efficient path algorithms for clustered Lasso and OSCAR to construct solution paths with respect to their regularization parameters. Despite too many terms in exhaustive pairwise regularization, their computational costs are reduced by using symmetry of those terms. Simple equivalent conditions to check subgradient equations in each feature group are derived by some graph theories. The proposed algorithms are shown to be more efficient than existing algorithms in numerical experiments.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.