Search | arXiv e-print repository

Change Point Detection in Pairwise Comparison Data with Covariates

Abstract: This paper introduces the novel piecewise stationary covariate-assisted ranking estimation (PS-CARE) model for analyzing time-evolving pairwise comparison data, enhancing item ranking accuracy through the integration of covariate information. By partitioning the data into distinct, stationary segments, the PS-CARE model adeptly detects temporal shifts in item rankings, known as change points, whos… ▽ More This paper introduces the novel piecewise stationary covariate-assisted ranking estimation (PS-CARE) model for analyzing time-evolving pairwise comparison data, enhancing item ranking accuracy through the integration of covariate information. By partitioning the data into distinct, stationary segments, the PS-CARE model adeptly detects temporal shifts in item rankings, known as change points, whose number and positions are initially unknown. Leveraging the minimum description length (MDL) principle, this paper establishes a statistically consistent model selection criterion to estimate these unknowns. The practical optimization of this MDL criterion is done with the pruned exact linear time (PELT) algorithm. Empirical evaluations reveal the method's promising performance in accurately locating change points across various simulated scenarios. An application to an NBA dataset yielded meaningful insights that aligned with significant historical events, highlighting the method's practical utility and the MDL criterion's effectiveness in capturing temporal ranking changes. To the best of the authors' knowledge, this research pioneers change point detection in pairwise comparison data with covariate information, representing a significant leap forward in the field of dynamic ranking analysis. △ Less

Submitted 24 August, 2024; originally announced August 2024.

arXiv:1911.06177 [pdf, other]

Uncertainty Quantification in Ensembles of Honest Regression Trees using Generalized Fiducial Inference

Authors: Suofei Wu, Jan Hannig, Thomas C. M. Lee

Abstract: Due to their accuracies, methods based on ensembles of regression trees are a popular approach for making predictions. Some common examples include Bayesian additive regression trees, boosting and random forests. This paper focuses on honest random forests, which add honesty to the original form of random forests and are proved to have better statistical properties. The main contribution is a new… ▽ More Due to their accuracies, methods based on ensembles of regression trees are a popular approach for making predictions. Some common examples include Bayesian additive regression trees, boosting and random forests. This paper focuses on honest random forests, which add honesty to the original form of random forests and are proved to have better statistical properties. The main contribution is a new method that quantifies the uncertainties of the estimates and predictions produced by honest random forests. The proposed method is based on the generalized fiducial methodology, and provides a fiducial density function that measures how likely each single honest tree is the true model. With such a density function, estimates and predictions, as well as their confidence/prediction intervals, can be obtained. The promising empirical properties of the proposed method are demonstrated by numerical comparisons with several state-of-the-art methods, and by applications to a few real data sets. Lastly, the proposed method is theoretically backed up by a strong asymptotic guarantee. △ Less

Submitted 14 November, 2019; originally announced November 2019.

arXiv:1908.01251 [pdf, other]

Measuring the Algorithmic Convergence of Randomized Ensembles: The Regression Setting

Authors: Miles E. Lopes, Suofei Wu, Thomas C. M. Lee

Abstract: When randomized ensemble methods such as bagging and random forests are implemented, a basic question arises: Is the ensemble large enough? In particular, the practitioner desires a rigorous guarantee that a given ensemble will perform nearly as well as an ideal infinite ensemble (trained on the same data). The purpose of the current paper is to develop a bootstrap method for solving this problem… ▽ More When randomized ensemble methods such as bagging and random forests are implemented, a basic question arises: Is the ensemble large enough? In particular, the practitioner desires a rigorous guarantee that a given ensemble will perform nearly as well as an ideal infinite ensemble (trained on the same data). The purpose of the current paper is to develop a bootstrap method for solving this problem in the context of regression --- which complements our companion paper in the context of classification (Lopes 2019). In contrast to the classification setting, the current paper shows that theoretical guarantees for the proposed bootstrap can be established under much weaker assumptions. In addition, we illustrate the flexibility of the method by showing how it can be adapted to measure algorithmic convergence for variable selection. Lastly, we provide numerical results demonstrating that the method works well in a range of situations. △ Less

Submitted 3 August, 2019; originally announced August 2019.

Comments: 36 pages

arXiv:1609.08882 [pdf, ps, other]

doi 10.3150/14-BEJ671

Piecewise quantile autoregressive modeling for nonstationary time series

Authors: Alexander Aue, Rex C. Y. Cheung, Thomas C. M. Lee, Ming Zhong

Abstract: We develop a new methodology for the fitting of nonstationary time series that exhibit nonlinearity, asymmetry, local persistence and changes in location scale and shape of the underlying distribution. In order to achieve this goal, we perform model selection in the class of piecewise stationary quantile autoregressive processes. The best model is defined in terms of minimizing a minimum descripti… ▽ More We develop a new methodology for the fitting of nonstationary time series that exhibit nonlinearity, asymmetry, local persistence and changes in location scale and shape of the underlying distribution. In order to achieve this goal, we perform model selection in the class of piecewise stationary quantile autoregressive processes. The best model is defined in terms of minimizing a minimum description length criterion derived from an asymmetric Laplace likelihood. Its practical minimization is done with the use of genetic algorithms. If the data generating process follows indeed a piecewise quantile autoregression structure, we show that our method is consistent for estimating the break points and the autoregressive parameters. Empirical work suggests that the proposed method performs well in finite samples. △ Less

Submitted 28 September, 2016; originally announced September 2016.

Comments: Published at http://dx.doi.org/10.3150/14-BEJ671 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm)

Report number: IMS-BEJ-BEJ671

Journal ref: Bernoulli 2017, Vol. 23, No. 1, 1-22

arXiv:1203.2087 [pdf, ps, other]

doi 10.1214/11-AOS925

On image segmentation using information theoretic criteria

Authors: Alexander Aue, Thomas C. M. Lee

Abstract: Image segmentation is a long-studied and important problem in image processing. Different solutions have been proposed, many of which follow the information theoretic paradigm. While these information theoretic segmentation methods often produce excellent empirical results, their theoretical properties are still largely unknown. The main goal of this paper is to conduct a rigorous theoretical stud… ▽ More Image segmentation is a long-studied and important problem in image processing. Different solutions have been proposed, many of which follow the information theoretic paradigm. While these information theoretic segmentation methods often produce excellent empirical results, their theoretical properties are still largely unknown. The main goal of this paper is to conduct a rigorous theoretical study into the statistical consistency properties of such methods. To be more specific, this paper investigates if these methods can accurately recover the true number of segments together with their true boundaries in the image as the number of pixels tends to infinity. Our theoretical results show that both the Bayesian information criterion (BIC) and the minimum description length (MDL) principle can be applied to derive statistically consistent segmentation methods, while the same is not true for the Akaike information criterion (AIC). Numerical experiments were conducted to illustrate and support our theoretical findings. △ Less

Submitted 9 March, 2012; originally announced March 2012.

Comments: Published in at http://dx.doi.org/10.1214/11-AOS925 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS925

Journal ref: Annals of Statistics 2011, Vol. 39, No. 6, 2912-2935

arXiv:math/0701196 [pdf, ps, other]

Self Consistency: A General Recipe for Wavelet Estimation With Irregularly-spaced and/or Incomplete Data

Authors: Thomas C. M. Lee, Xiao-Li Meng

Abstract: Inspired by the key principle behind the EM algorithm, we propose a general methodology for conducting wavelet estimation with irregularly-spaced data by viewing the data as the observed portion of an augmented regularly-spaced data set. We then invoke the self-consistency principle to define our wavelet estimators in the presence of incomplete data. Major advantages of this approach include: (i… ▽ More Inspired by the key principle behind the EM algorithm, we propose a general methodology for conducting wavelet estimation with irregularly-spaced data by viewing the data as the observed portion of an augmented regularly-spaced data set. We then invoke the self-consistency principle to define our wavelet estimators in the presence of incomplete data. Major advantages of this approach include: (i) it can be coupled with almost any wavelet shrinkage methods, (ii) it can deal with non--Gaussian or correlated noise, and (iii) it can automatically handle other kinds of missing or incomplete observations. We also develop a multiple-imputation algorithm and fast EM-type algorithms for computing or approximating such estimates. Results from numerical experiments suggest that our algorithms produce favorite results when comparing to several common methods, and therefore we hope these empirical findings would motivate subsequent theoretical investigations. To illustrate the flexibility of our approach, examples with Poisson data smoothing and image denoising are also provided. △ Less

Submitted 6 January, 2007; originally announced January 2007.

Showing 1–6 of 6 results for author: Lee, T C M