-
Change Point Detection in Pairwise Comparison Data with Covariates
Authors:
Yi Han,
Thomas C. M. Lee
Abstract:
This paper introduces the novel piecewise stationary covariate-assisted ranking estimation (PS-CARE) model for analyzing time-evolving pairwise comparison data, enhancing item ranking accuracy through the integration of covariate information. By partitioning the data into distinct, stationary segments, the PS-CARE model adeptly detects temporal shifts in item rankings, known as change points, whos…
▽ More
This paper introduces the novel piecewise stationary covariate-assisted ranking estimation (PS-CARE) model for analyzing time-evolving pairwise comparison data, enhancing item ranking accuracy through the integration of covariate information. By partitioning the data into distinct, stationary segments, the PS-CARE model adeptly detects temporal shifts in item rankings, known as change points, whose number and positions are initially unknown. Leveraging the minimum description length (MDL) principle, this paper establishes a statistically consistent model selection criterion to estimate these unknowns. The practical optimization of this MDL criterion is done with the pruned exact linear time (PELT) algorithm. Empirical evaluations reveal the method's promising performance in accurately locating change points across various simulated scenarios. An application to an NBA dataset yielded meaningful insights that aligned with significant historical events, highlighting the method's practical utility and the MDL criterion's effectiveness in capturing temporal ranking changes. To the best of the authors' knowledge, this research pioneers change point detection in pairwise comparison data with covariate information, representing a significant leap forward in the field of dynamic ranking analysis.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Uncertainty Quantification in Ensembles of Honest Regression Trees using Generalized Fiducial Inference
Authors:
Suofei Wu,
Jan Hannig,
Thomas C. M. Lee
Abstract:
Due to their accuracies, methods based on ensembles of regression trees are a popular approach for making predictions. Some common examples include Bayesian additive regression trees, boosting and random forests. This paper focuses on honest random forests, which add honesty to the original form of random forests and are proved to have better statistical properties. The main contribution is a new…
▽ More
Due to their accuracies, methods based on ensembles of regression trees are a popular approach for making predictions. Some common examples include Bayesian additive regression trees, boosting and random forests. This paper focuses on honest random forests, which add honesty to the original form of random forests and are proved to have better statistical properties. The main contribution is a new method that quantifies the uncertainties of the estimates and predictions produced by honest random forests. The proposed method is based on the generalized fiducial methodology, and provides a fiducial density function that measures how likely each single honest tree is the true model. With such a density function, estimates and predictions, as well as their confidence/prediction intervals, can be obtained. The promising empirical properties of the proposed method are demonstrated by numerical comparisons with several state-of-the-art methods, and by applications to a few real data sets. Lastly, the proposed method is theoretically backed up by a strong asymptotic guarantee.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
Measuring the Algorithmic Convergence of Randomized Ensembles: The Regression Setting
Authors:
Miles E. Lopes,
Suofei Wu,
Thomas C. M. Lee
Abstract:
When randomized ensemble methods such as bagging and random forests are implemented, a basic question arises: Is the ensemble large enough? In particular, the practitioner desires a rigorous guarantee that a given ensemble will perform nearly as well as an ideal infinite ensemble (trained on the same data). The purpose of the current paper is to develop a bootstrap method for solving this problem…
▽ More
When randomized ensemble methods such as bagging and random forests are implemented, a basic question arises: Is the ensemble large enough? In particular, the practitioner desires a rigorous guarantee that a given ensemble will perform nearly as well as an ideal infinite ensemble (trained on the same data). The purpose of the current paper is to develop a bootstrap method for solving this problem in the context of regression --- which complements our companion paper in the context of classification (Lopes 2019). In contrast to the classification setting, the current paper shows that theoretical guarantees for the proposed bootstrap can be established under much weaker assumptions. In addition, we illustrate the flexibility of the method by showing how it can be adapted to measure algorithmic convergence for variable selection. Lastly, we provide numerical results demonstrating that the method works well in a range of situations.
△ Less
Submitted 3 August, 2019;
originally announced August 2019.
-
Piecewise quantile autoregressive modeling for nonstationary time series
Authors:
Alexander Aue,
Rex C. Y. Cheung,
Thomas C. M. Lee,
Ming Zhong
Abstract:
We develop a new methodology for the fitting of nonstationary time series that exhibit nonlinearity, asymmetry, local persistence and changes in location scale and shape of the underlying distribution. In order to achieve this goal, we perform model selection in the class of piecewise stationary quantile autoregressive processes. The best model is defined in terms of minimizing a minimum descripti…
▽ More
We develop a new methodology for the fitting of nonstationary time series that exhibit nonlinearity, asymmetry, local persistence and changes in location scale and shape of the underlying distribution. In order to achieve this goal, we perform model selection in the class of piecewise stationary quantile autoregressive processes. The best model is defined in terms of minimizing a minimum description length criterion derived from an asymmetric Laplace likelihood. Its practical minimization is done with the use of genetic algorithms. If the data generating process follows indeed a piecewise quantile autoregression structure, we show that our method is consistent for estimating the break points and the autoregressive parameters. Empirical work suggests that the proposed method performs well in finite samples.
△ Less
Submitted 28 September, 2016;
originally announced September 2016.
-
On image segmentation using information theoretic criteria
Authors:
Alexander Aue,
Thomas C. M. Lee
Abstract:
Image segmentation is a long-studied and important problem in image processing. Different solutions have been proposed, many of which follow the information theoretic paradigm. While these information theoretic segmentation methods often produce excellent empirical results, their theoretical properties are still largely unknown. The main goal of this paper is to conduct a rigorous theoretical stud…
▽ More
Image segmentation is a long-studied and important problem in image processing. Different solutions have been proposed, many of which follow the information theoretic paradigm. While these information theoretic segmentation methods often produce excellent empirical results, their theoretical properties are still largely unknown. The main goal of this paper is to conduct a rigorous theoretical study into the statistical consistency properties of such methods. To be more specific, this paper investigates if these methods can accurately recover the true number of segments together with their true boundaries in the image as the number of pixels tends to infinity. Our theoretical results show that both the Bayesian information criterion (BIC) and the minimum description length (MDL) principle can be applied to derive statistically consistent segmentation methods, while the same is not true for the Akaike information criterion (AIC). Numerical experiments were conducted to illustrate and support our theoretical findings.
△ Less
Submitted 9 March, 2012;
originally announced March 2012.
-
Self Consistency: A General Recipe for Wavelet Estimation With Irregularly-spaced and/or Incomplete Data
Authors:
Thomas C. M. Lee,
Xiao-Li Meng
Abstract:
Inspired by the key principle behind the EM algorithm, we propose a general methodology for conducting wavelet estimation with irregularly-spaced data by viewing the data as the observed portion of an augmented regularly-spaced data set. We then invoke the self-consistency principle to define our wavelet estimators in the presence of incomplete data. Major advantages of this approach include: (i…
▽ More
Inspired by the key principle behind the EM algorithm, we propose a general methodology for conducting wavelet estimation with irregularly-spaced data by viewing the data as the observed portion of an augmented regularly-spaced data set. We then invoke the self-consistency principle to define our wavelet estimators in the presence of incomplete data. Major advantages of this approach include: (i) it can be coupled with almost any wavelet shrinkage methods, (ii) it can deal with non--Gaussian or correlated noise, and (iii) it can automatically handle other kinds of missing or incomplete observations. We also develop a multiple-imputation algorithm and fast EM-type algorithms for computing or approximating such estimates. Results from numerical experiments suggest that our algorithms produce favorite results when comparing to several common methods, and therefore we hope these empirical findings would motivate subsequent theoretical investigations. To illustrate the flexibility of our approach, examples with Poisson data smoothing and image denoising are also provided.
△ Less
Submitted 6 January, 2007;
originally announced January 2007.