Skip to main content

Showing 1–22 of 22 results for author: Hocking, T

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.07413  [pdf, ps, other

    cs.LG stat.AP

    Learning Penalty for Optimal Partitioning via Automatic Feature Extraction

    Authors: Tung L Nguyen, Toby Hocking

    Abstract: Changepoint detection identifies significant shifts in data sequences, making it important in areas like finance, genetics, and healthcare. The Optimal Partitioning algorithms efficiently detect these changes, using a penalty parameter to limit the changepoints number. Determining the appropriate value for this penalty can be challenging. Traditionally, this process involved manually extracting st… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 9 Figures

  2. arXiv:2503.02011  [pdf, other

    cs.LG stat.ML

    Interval Regression: A Comparative Study with Proposed Models

    Authors: Tung L Nguyen, Toby Dylan Hocking

    Abstract: Regression models are essential for a wide range of real-world applications. However, in practice, target values are not always precisely known; instead, they may be represented as intervals of acceptable values. This challenge has led to the development of Interval Regression models. In this study, we provide a comprehensive review of existing Interval Regression models and introduce alternative… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 13 pages, 4 figures

  3. arXiv:2410.08654  [pdf, other

    cs.LG stat.CO

    Finite Sample Complexity Analysis of Binary Segmentation

    Authors: Toby Dylan Hocking

    Abstract: Binary segmentation is the classic greedy algorithm which recursively splits a sequential data set by optimizing some loss or likelihood function. Binary segmentation is widely used for changepoint detection in data sets measured over space or time, and as a sub-routine for decision tree learning. In theory it should be extremely fast for $N$ data and $K$ splits, $O(N K)$ in the worst case, and… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  4. arXiv:2410.08643  [pdf, other

    stat.ML cs.AI cs.LG

    SOAK: Same/Other/All K-fold cross-validation for estimating similarity of patterns in data subsets

    Authors: Toby Dylan Hocking, Gabrielle Thibault, Cameron Scott Bodine, Paul Nelson Arellano, Alexander F Shenkin, Olivia Jasmine Lindly

    Abstract: In many real-world applications of machine learning, we are interested to know if it is possible to train on the data that we have gathered so far, and obtain accurate predictions on a new test data subset that is qualitatively different in some respect (time period, geographic region, etc). Another question is whether data subsets are similar enough so that it is beneficial to combine subsets dur… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  5. arXiv:2410.08635  [pdf, other

    cs.LG cs.AI stat.ML

    Efficient line search for optimizing Area Under the ROC Curve in gradient descent

    Authors: Jadon Fowler, Toby Dylan Hocking

    Abstract: Receiver Operating Characteristic (ROC) curves are useful for evaluation in binary classification and changepoint detection, but difficult to use for learning since the Area Under the Curve (AUC) is piecewise constant (gradient zero almost everywhere). Recently the Area Under Min (AUM) of false positive and false negative rates has been proposed as a differentiable surrogate for AUC. In this paper… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  6. arXiv:2408.00856  [pdf, other

    stat.ML cs.LG

    Enhancing Changepoint Detection: Penalty Learning through Deep Learning Techniques

    Authors: Tung L Nguyen, Toby Dylan Hocking

    Abstract: Changepoint detection, a technique for identifying significant shifts within data sequences, is crucial in various fields such as finance, genomics, medicine, etc. Dynamic programming changepoint detection algorithms are employed to identify the locations of changepoints within a sequence, which rely on a penalty parameter to regulate the number of changepoints. To estimate this penalty parameter,… ▽ More

    Submitted 17 September, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: 17 pages, 7 figures

  7. arXiv:2210.02580  [pdf, other

    cs.LG stat.ML

    Functional Labeled Optimal Partitioning

    Authors: Toby D. Hocking, Jacob M. Kaufman, Alyssa J. Stenberg

    Abstract: Peak detection is a problem in sequential data analysis that involves differentiating regions with higher counts (peaks) from regions with lower counts (background noise). It is crucial to correctly predict areas that deviate from the background noise, in both the train and test sets of labels. Dynamic programming changepoint algorithms have been proposed to solve the peak detection problem by… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  8. arXiv:2107.01285  [pdf, other

    stat.ML cs.LG

    Optimizing ROC Curves with a Sort-Based Surrogate Loss Function for Binary Classification and Changepoint Detection

    Authors: Jonathan Hillman, Toby Dylan Hocking

    Abstract: Receiver Operating Characteristic (ROC) curves are plots of true positive rate versus false positive rate which are useful for evaluating binary classification models, but difficult to use for learning since the Area Under the Curve (AUC) is non-convex. ROC curves can also be used in other problems that have false positive and true positive rates such as changepoint detection. We show that in this… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  9. arXiv:2012.06848  [pdf, other

    q-bio.QM stat.ML

    Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models

    Authors: Arnaud Liehrmann, Guillem Rigaill, Toby Dylan Hocking

    Abstract: Motivation: Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, in… ▽ More

    Submitted 15 December, 2020; v1 submitted 12 December, 2020; originally announced December 2020.

    Comments: 20 pages, 8 figures; updated broken citations and references

  10. arXiv:2006.13967  [pdf, other

    stat.ML cs.LG

    Labeled Optimal Partitioning

    Authors: Toby Dylan Hocking, Anuraag Srivastava

    Abstract: In data sequences measured over space or time, an important problem is accurate detection of abrupt changes. In partially labeled data, it is important to correctly predict presence/absence of changes in positive/negative labeled regions, in both the train and test sets. One existing dynamic programming algorithm is designed for prediction in unlabeled test regions (and ignores the labels in the t… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

  11. arXiv:2006.04920  [pdf, other

    cs.LG stat.ML

    Survival regression with accelerated failure time model in XGBoost

    Authors: Avinash Barnwal, Hyunsu Cho, Toby Dylan Hocking

    Abstract: Survival regression is used to estimate the relation between time-to-event and feature variables, and is important in application domains such as medicine, marketing, risk management and sales management. Nonlinear tree based machine learning algorithms as implemented in libraries such as XGBoost, scikit-learn, LightGBM, and CatBoost are often more accurate in practice than linear models. However,… ▽ More

    Submitted 21 August, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

  12. arXiv:2004.13558  [pdf, other

    eess.SP cs.LG stat.ML

    A Graph-constrained Changepoint Detection Approach for ECG Segmentation

    Authors: Atiyeh Fotoohinasab, Toby Hocking, Fatemeh Afghah

    Abstract: Electrocardiogram (ECG) signal is the most commonly used non-invasive tool in the assessment of cardiovascular diseases. Segmentation of the ECG signal to locate its constitutive waves, in particular the R-peaks, is a key step in ECG processing and analysis. Over the years, several segmentation and QRS complex detection algorithms have been proposed with different features; however, their performa… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

  13. arXiv:2003.02808  [pdf, other

    cs.LG cs.DS stat.ML

    Linear time dynamic programming for the exact path of optimal models selected from a finite set

    Authors: Toby Hocking, Joseph Vargovich

    Abstract: Many learning algorithms are formulated in terms of finding model parameters which minimize a data-fitting loss function plus a regularizer. When the regularizer involves the l0 pseudo-norm, the resulting regularization path consists of a finite set of models. The fastest existing algorithm for computing the breakpoints in the regularization path is quadratic in the number of models, so it scales… ▽ More

    Submitted 5 March, 2020; originally announced March 2020.

    Comments: 14 pages

  14. arXiv:2002.03646  [pdf, other

    stat.CO

    gfpop: an R Package for Univariate Graph-Constrained Change-Point Detection

    Authors: Vincent Runge, Toby Dylan Hocking, Gaetano Romano, Fatemeh Afghah, Paul Fearnhead, Guillem Rigaill

    Abstract: In a world with data that change rapidly and abruptly, it is important to detect those changes accurately. In this paper we describe an R package implementing a generalized version of an algorithm recently proposed by Hocking et al. [2020] for penalized maximum likelihood inference of constrained multiple change-point models. This algorithm can be used to pinpoint the precise locations of abrupt c… ▽ More

    Submitted 11 April, 2022; v1 submitted 10 February, 2020; originally announced February 2020.

    MSC Class: 62M10; 60J22

  15. arXiv:1810.00117  [pdf, other

    stat.CO

    Generalized Functional Pruning Optimal Partitioning (GFPOP) for Constrained Changepoint Detection in Genomic Data

    Authors: Toby Dylan Hocking, Guillem Rigaill, Paul Fearnhead, Guillaume Bourque

    Abstract: We describe a new algorithm and R package for peak detection in genomic data sets using constrained changepoint algorithms. These detect changes from background to peak regions by imposing the constraint that the mean should alternately increase then decrease. An existing algorithm for this problem exists, and gives state-of-the-art accuracy results, but it is computationally expensive when the nu… ▽ More

    Submitted 28 September, 2018; originally announced October 2018.

  16. arXiv:1802.07380  [pdf, other

    stat.ME q-bio.NC stat.AP

    Fast Nonconvex Deconvolution of Calcium Imaging Data

    Authors: Sean Jewell, Toby Dylan Hocking, Paul Fearnhead, Daniela Witten

    Abstract: Calcium imaging data promises to transform the field of neuroscience by making it possible to record from large populations of neurons simultaneously. However, determining the exact moment in time at which a neuron spikes, from a calcium imaging data set, amounts to a non-trivial deconvolution problem which is of critical importance for downstream analyses. While a number of formulations have been… ▽ More

    Submitted 20 February, 2018; originally announced February 2018.

    Comments: 30 pages, 9 figures

  17. arXiv:1710.04234  [pdf, other

    stat.ML cs.DS cs.LG stat.AP

    Maximum Margin Interval Trees

    Authors: Alexandre Drouin, Toby Dylan Hocking, François Laviolette

    Abstract: Learning a regression function using censored or interval-valued output data is an important problem in fields such as genomics and medicine. The goal is to learn a real-valued prediction function, and the training output labels indicate an interval of possible values. Whereas most existing algorithms for this task are linear models, in this paper we investigate learning nonlinear tree models. We… ▽ More

    Submitted 27 October, 2017; v1 submitted 11 October, 2017; originally announced October 2017.

    Comments: Accepted for presentation at the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA

  18. arXiv:1703.03352  [pdf, other

    stat.CO q-bio.GN stat.ML

    A log-linear time algorithm for constrained changepoint detection

    Authors: Toby Dylan Hocking, Guillem Rigaill, Paul Fearnhead, Guillaume Bourque

    Abstract: Changepoint detection is a central problem in time series and genomic data. For some applications, it is natural to impose constraints on the directions of changes. One example is ChIP-seq data, for which adding an up-down constraint improves peak detection accuracy, but makes the optimization problem more complicated. We show how a recently proposed functional pruning technique can be adapted to… ▽ More

    Submitted 9 March, 2017; originally announced March 2017.

  19. arXiv:1509.00368  [pdf, other

    math.ST stat.ME

    A breakpoint detection error function for segmentation model selection and evaluation

    Authors: Toby Dylan Hocking

    Abstract: We consider the multiple breakpoint detection problem, which is concerned with detecting the locations of several distinct changes in a one-dimensional noisy data series. We propose the breakpointError, a function that can be used to evaluate estimated breakpoint locations, given the known locations of true breakpoints. We discuss an application of the breakpointError for finding optimal penalties… ▽ More

    Submitted 1 September, 2015; originally announced September 2015.

  20. arXiv:1506.01286  [pdf, other

    stat.ML q-bio.GN

    PeakSegJoint: fast supervised peak detection via joint segmentation of multiple count data samples

    Authors: Toby Dylan Hocking, Guillaume Bourque

    Abstract: Joint peak detection is a central problem when comparing samples in genomic data analysis, but current algorithms for this task are unsupervised and limited to at most 2 sample types. We propose PeakSegJoint, a new constrained maximum likelihood segmentation model for any number of sample types. To select the number of peaks in the segmentation, we propose a supervised penalty learning model. To i… ▽ More

    Submitted 3 June, 2015; originally announced June 2015.

    Comments: 11 pages, 5 figures

  21. arXiv:1409.1842  [pdf, other

    stat.ME stat.CO

    On Optimal Multiple Changepoint Algorithms for Large Data

    Authors: Robert Maidstone, Toby Hocking, Guillem Rigaill, Paul Fearnhead

    Abstract: There is an increasing need for algorithms that can accurately detect changepoints in long time-series, or equivalent, data. Many common approaches to detecting changepoints, for example based on penalised likelihood or minimum description length, can be formulated in terms of minimising a cost over segmentations. Dynamic programming methods exist to solve this minimisation problem exactly, but th… ▽ More

    Submitted 5 September, 2014; originally announced September 2014.

    Comments: 20 pages

    MSC Class: 62M10

  22. arXiv:1401.8008  [pdf, ps, other

    stat.ML cs.LG

    Support vector comparison machines

    Authors: David Venuto, Toby Dylan Hocking, Lakjaree Sphanurattana, Masashi Sugiyama

    Abstract: In ranking problems, the goal is to learn a ranking function from labeled pairs of input points. In this paper, we consider the related comparison problem, where the label indicates which element of the pair is better, or if there is no significant difference. We cast the learning problem as a margin maximization, and show that it can be solved by converting it to a standard SVM. We use simulated… ▽ More

    Submitted 23 July, 2020; v1 submitted 30 January, 2014; originally announced January 2014.