-
MONSTER: Monash Scalable Time Series Evaluation Repository
Authors:
Angus Dempster,
Navid Mohammadi Foumani,
Chang Wei Tan,
Lynn Miller,
Amish Mishra,
Mahsa Salehi,
Charlotte Pelletier,
Daniel F. Schmidt,
Geoffrey I. Webb
Abstract:
We introduce MONSTER-the MONash Scalable Time Series Evaluation Repository-a collection of large datasets for time series classification. The field of time series classification has benefitted from common benchmarks set by the UCR and UEA time series classification repositories. However, the datasets in these benchmarks are small, with median sizes of 217 and 255 examples, respectively. In consequ…
▽ More
We introduce MONSTER-the MONash Scalable Time Series Evaluation Repository-a collection of large datasets for time series classification. The field of time series classification has benefitted from common benchmarks set by the UCR and UEA time series classification repositories. However, the datasets in these benchmarks are small, with median sizes of 217 and 255 examples, respectively. In consequence they favour a narrow subspace of models that are optimised to achieve low classification error on a wide variety of smaller datasets, that is, models that minimise variance, and give little weight to computational issues such as scalability. Our hope is to diversify the field by introducing benchmarks using larger datasets. We believe that there is enormous potential for new progress in the field by engaging with the theoretical and practical challenges of learning effectively from larger quantities of data.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Prevalidated ridge regression is a highly-efficient drop-in replacement for logistic regression for high-dimensional data
Authors:
Angus Dempster,
Geoffrey I. Webb,
Daniel F. Schmidt
Abstract:
Logistic regression is a ubiquitous method for probabilistic classification. However, the effectiveness of logistic regression depends upon careful and relatively computationally expensive tuning, especially for the regularisation hyperparameter, and especially in the context of high-dimensional data. We present a prevalidated ridge regression model that closely matches logistic regression in term…
▽ More
Logistic regression is a ubiquitous method for probabilistic classification. However, the effectiveness of logistic regression depends upon careful and relatively computationally expensive tuning, especially for the regularisation hyperparameter, and especially in the context of high-dimensional data. We present a prevalidated ridge regression model that closely matches logistic regression in terms of classification error and log-loss, particularly for high-dimensional data, while being significantly more computationally efficient and having effectively no hyperparameters beyond regularisation. We scale the coefficients of the model so as to minimise log-loss for a set of prevalidated predictions derived from the estimated leave-one-out cross-validation error. This exploits quantities already computed in the course of fitting the ridge regression model in order to find the scaling parameter with nominal additional computational expense.
△ Less
Submitted 3 April, 2025; v1 submitted 28 January, 2024;
originally announced January 2024.
-
QUANT: A Minimalist Interval Method for Time Series Classification
Authors:
Angus Dempster,
Daniel F. Schmidt,
Geoffrey I. Webb
Abstract:
We show that it is possible to achieve the same accuracy, on average, as the most accurate existing interval methods for time series classification on a standard set of benchmark datasets using a single type of feature (quantiles), fixed intervals, and an 'off the shelf' classifier. This distillation of interval-based approaches represents a fast and accurate method for time series classification,…
▽ More
We show that it is possible to achieve the same accuracy, on average, as the most accurate existing interval methods for time series classification on a standard set of benchmark datasets using a single type of feature (quantiles), fixed intervals, and an 'off the shelf' classifier. This distillation of interval-based approaches represents a fast and accurate method for time series classification, achieving state-of-the-art accuracy on the expanded set of 142 datasets in the UCR archive with a total compute time (training and inference) of less than 15 minutes using a single CPU core.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors
Authors:
Paul S. Scotti,
Atmadeep Banerjee,
Jimmie Goode,
Stepan Shabalin,
Alex Nguyen,
Ethan Cohen,
Aidan J. Dempster,
Nathalie Verlinde,
Elad Yundler,
David Weisberg,
Kenneth A. Norman,
Tanishq Mathew Abraham
Abstract:
We present MindEye, a novel fMRI-to-image approach to retrieve and reconstruct viewed images from brain activity. Our model comprises two parallel submodules that are specialized for retrieval (using contrastive learning) and reconstruction (using a diffusion prior). MindEye can map fMRI brain activity to any high dimensional multimodal latent space, like CLIP image space, enabling image reconstru…
▽ More
We present MindEye, a novel fMRI-to-image approach to retrieve and reconstruct viewed images from brain activity. Our model comprises two parallel submodules that are specialized for retrieval (using contrastive learning) and reconstruction (using a diffusion prior). MindEye can map fMRI brain activity to any high dimensional multimodal latent space, like CLIP image space, enabling image reconstruction using generative models that accept embeddings from this latent space. We comprehensively compare our approach with other existing methods, using both qualitative side-by-side comparisons and quantitative evaluations, and show that MindEye achieves state-of-the-art performance in both reconstruction and retrieval tasks. In particular, MindEye can retrieve the exact original image even among highly similar candidates indicating that its brain embeddings retain fine-grained image-specific information. This allows us to accurately retrieve images even from large-scale databases like LAION-5B. We demonstrate through ablations that MindEye's performance improvements over previous methods result from specialized submodules for retrieval and reconstruction, improved training techniques, and training models with orders of magnitude more parameters. Furthermore, we show that MindEye can better preserve low-level image features in the reconstructions by using img2img, with outputs from a separate autoencoder. All code is available on GitHub.
△ Less
Submitted 7 October, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
An Approach to Multiple Comparison Benchmark Evaluations that is Stable Under Manipulation of the Comparate Set
Authors:
Ali Ismail-Fawaz,
Angus Dempster,
Chang Wei Tan,
Matthieu Herrmann,
Lynn Miller,
Daniel F. Schmidt,
Stefano Berretti,
Jonathan Weber,
Maxime Devanne,
Germain Forestier,
Geoffrey I. Webb
Abstract:
The measurement of progress using benchmarks evaluations is ubiquitous in computer science and machine learning. However, common approaches to analyzing and presenting the results of benchmark comparisons of multiple algorithms over multiple datasets, such as the critical difference diagram introduced by Demšar (2006), have important shortcomings and, we show, are open to both inadvertent and inte…
▽ More
The measurement of progress using benchmarks evaluations is ubiquitous in computer science and machine learning. However, common approaches to analyzing and presenting the results of benchmark comparisons of multiple algorithms over multiple datasets, such as the critical difference diagram introduced by Demšar (2006), have important shortcomings and, we show, are open to both inadvertent and intentional manipulation. To address these issues, we propose a new approach to presenting the results of benchmark comparisons, the Multiple Comparison Matrix (MCM), that prioritizes pairwise comparisons and precludes the means of manipulating experimental results in existing approaches. MCM can be used to show the results of an all-pairs comparison, or to show the results of a comparison between one or more selected algorithms and the state of the art. MCM is implemented in Python and is publicly available.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Efficient low-thrust trajectory data generation based on generative adversarial network
Authors:
Ruida Xie,
Andrew G. Dempster
Abstract:
Deep learning-based techniques have been introduced into the field of trajectory optimization in recent years. Deep Neural Networks (DNNs) are trained and used as the surrogates of conventional optimization process. They can provide low thrust (LT) transfer cost estimation and enable more complex preliminary mission designs. However, it is a challenge to efficiently obtain the required amount of t…
▽ More
Deep learning-based techniques have been introduced into the field of trajectory optimization in recent years. Deep Neural Networks (DNNs) are trained and used as the surrogates of conventional optimization process. They can provide low thrust (LT) transfer cost estimation and enable more complex preliminary mission designs. However, it is a challenge to efficiently obtain the required amount of trajectory data for training. A Generative Adversarial Network (GAN) is adapted to generate the feasible LT trajectory data efficiently. The GAN consists of a generator and a discriminator, both of which are deep networks. The generator generates fake LT transfer features using random noise as input, while the discriminator distinguishes the generator's fake LT transfer features from real LT transfer features. The GAN is trained until the generator generates fake LT transfers that the discriminator cannot identify. This indicates the generator generates low thrust transfer features that have the same distribution as the real transfer features. The generated low thrust transfer data have a high convergence rate, and they can be used to efficiently produce training data for deep learning models. The proposed approach is validated by generating feasible LT transfers in a Near-Earth Asteroid (NEA) mission scenario. The convergence rate of GAN-generated samples is 84.3%.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
HYDRA: Competing convolutional kernels for fast and accurate time series classification
Authors:
Angus Dempster,
Daniel F. Schmidt,
Geoffrey I. Webb
Abstract:
We demonstrate a simple connection between dictionary methods for time series classification, which involve extracting and counting symbolic patterns in time series, and methods based on transforming input time series using convolutional kernels, namely ROCKET and its variants. We show that by adjusting a single hyperparameter it is possible to move by degrees between models resembling dictionary…
▽ More
We demonstrate a simple connection between dictionary methods for time series classification, which involve extracting and counting symbolic patterns in time series, and methods based on transforming input time series using convolutional kernels, namely ROCKET and its variants. We show that by adjusting a single hyperparameter it is possible to move by degrees between models resembling dictionary methods and models resembling ROCKET. We present HYDRA, a simple, fast, and accurate dictionary method for time series classification using competing convolutional kernels, combining key aspects of both ROCKET and conventional dictionary methods. HYDRA is faster and more accurate than the most accurate existing dictionary methods, and can be combined with ROCKET and its variants to further improve the accuracy of these methods.
△ Less
Submitted 25 March, 2022;
originally announced March 2022.
-
Feasible Low-thrust Trajectory Identification via a Deep Neural Network Classifier
Authors:
Ruida Xie,
Andrew G. Dempster
Abstract:
In recent years, deep learning techniques have been introduced into the field of trajectory optimization to improve convergence and speed. Training such models requires large trajectory datasets. However, the convergence of low thrust (LT) optimizations is unpredictable before the optimization process ends. For randomly initialized low thrust transfer data generation, most of the computation power…
▽ More
In recent years, deep learning techniques have been introduced into the field of trajectory optimization to improve convergence and speed. Training such models requires large trajectory datasets. However, the convergence of low thrust (LT) optimizations is unpredictable before the optimization process ends. For randomly initialized low thrust transfer data generation, most of the computation power will be wasted on optimizing infeasible low thrust transfers, which leads to an inefficient data generation process. This work proposes a deep neural network (DNN) classifier to accurately identify feasible LT transfer prior to the optimization process. The DNN-classifier achieves an overall accuracy of 97.9%, which has the best performance among the tested algorithms. The accurate low-thrust trajectory feasibility identification can avoid optimization on undesired samples, so that the majority of the optimized samples are LT trajectories that converge. This technique enables efficient dataset generation for different mission scenarios with different spacecraft configurations.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
MultiRocket: Multiple pooling operators and transformations for fast and effective time series classification
Authors:
Chang Wei Tan,
Angus Dempster,
Christoph Bergmeir,
Geoffrey I. Webb
Abstract:
We propose MultiRocket, a fast time series classification (TSC) algorithm that achieves state-of-the-art performance with a tiny fraction of the time and without the complex ensembling structure of many state-of-the-art methods. MultiRocket improves on MiniRocket, one of the fastest TSC algorithms to date, by adding multiple pooling operators and transformations to improve the diversity of the fea…
▽ More
We propose MultiRocket, a fast time series classification (TSC) algorithm that achieves state-of-the-art performance with a tiny fraction of the time and without the complex ensembling structure of many state-of-the-art methods. MultiRocket improves on MiniRocket, one of the fastest TSC algorithms to date, by adding multiple pooling operators and transformations to improve the diversity of the features generated. In addition to processing the raw input series, MultiRocket also applies first order differences to transform the original series. Convolutions are applied to both representations, and four pooling operators are applied to the convolution outputs. When benchmarked using the University of California Riverside TSC benchmark datasets, MultiRocket is significantly more accurate than MiniRocket, and competitive with the best ranked current method in terms of accuracy, HIVE-COTE 2.0, while being orders of magnitude faster.
△ Less
Submitted 21 February, 2022; v1 submitted 31 January, 2021;
originally announced February 2021.
-
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification
Authors:
Angus Dempster,
Daniel F. Schmidt,
Geoffrey I. Webb
Abstract:
Until recently, the most accurate methods for time series classification were limited by high computational complexity. ROCKET achieves state-of-the-art accuracy with a fraction of the computational expense of most existing methods by transforming input time series using random convolutional kernels, and using the transformed features to train a linear classifier. We reformulate ROCKET into a new…
▽ More
Until recently, the most accurate methods for time series classification were limited by high computational complexity. ROCKET achieves state-of-the-art accuracy with a fraction of the computational expense of most existing methods by transforming input time series using random convolutional kernels, and using the transformed features to train a linear classifier. We reformulate ROCKET into a new method, MINIROCKET, making it up to 75 times faster on larger datasets, and making it almost deterministic (and optionally, with additional computational expense, fully deterministic), while maintaining essentially the same accuracy. Using this method, it is possible to train and test a classifier on all of 109 datasets from the UCR archive to state-of-the-art accuracy in less than 10 minutes. MINIROCKET is significantly faster than any other method of comparable accuracy (including ROCKET), and significantly more accurate than any other method of even roughly-similar computational expense. As such, we suggest that MINIROCKET should now be considered and used as the default variant of ROCKET.
△ Less
Submitted 14 July, 2021; v1 submitted 16 December, 2020;
originally announced December 2020.
-
Estimation of Orofacial Kinematics in Parkinson's Disease: Comparison of 2D and 3D Markerless Systems for Motion Tracking
Authors:
Diego L. Guarin,
Aidan Dempster,
Andrea Bandini,
Yana Yunusova,
Babak Taati
Abstract:
Orofacial deficits are common in people with Parkinson's disease (PD) and their evolution might represent an important biomarker of disease progression. We are developing an automated system for assessment of orofacial function in PD that can be used in-home or in-clinic and can provide useful and objective clinical information that informs disease management. Our current approach relies on color…
▽ More
Orofacial deficits are common in people with Parkinson's disease (PD) and their evolution might represent an important biomarker of disease progression. We are developing an automated system for assessment of orofacial function in PD that can be used in-home or in-clinic and can provide useful and objective clinical information that informs disease management. Our current approach relies on color and depth cameras for the estimation of 3D facial movements. However, depth cameras are not commonly available, might be expensive, and require specialized software for control and data processing. The objective of this paper was to evaluate if depth cameras are needed to differentiate between healthy controls and PD patients based on features extracted from orofacial kinematics. Results indicate that 2D features, extracted from color cameras only, are as informative as 3D features, extracted from color and depth cameras, differentiating healthy controls from PD patients. These results pave the way for the development of a universal system for automatic and objective assessment of orofacial function in PD.
△ Less
Submitted 18 March, 2020;
originally announced March 2020.
-
ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels
Authors:
Angus Dempster,
François Petitjean,
Geoffrey I. Webb
Abstract:
Most methods for time series classification that attain state-of-the-art accuracy have high computational complexity, requiring significant training time even for smaller datasets, and are intractable for larger datasets. Additionally, many existing methods focus on a single type of feature such as shape or frequency. Building on the recent success of convolutional neural networks for time series…
▽ More
Most methods for time series classification that attain state-of-the-art accuracy have high computational complexity, requiring significant training time even for smaller datasets, and are intractable for larger datasets. Additionally, many existing methods focus on a single type of feature such as shape or frequency. Building on the recent success of convolutional neural networks for time series classification, we show that simple linear classifiers using random convolutional kernels achieve state-of-the-art accuracy with a fraction of the computational expense of existing methods.
△ Less
Submitted 28 October, 2019;
originally announced October 2019.
-
Artificial Intelligence and Location Verification in Vehicular Networks
Authors:
Ullah Ihsan,
Ziqing Wang,
Robert Malaney,
Andrew Dempster,
Shihao Yan
Abstract:
Location information claimed by devices will play an ever-increasing role in future wireless networks such as 5G, the Internet of Things (IoT). Against this background, the verification of such claimed location information will be an issue of growing importance. A formal information-theoretic Location Verification System (LVS) can address this issue to some extent, but such a system usually operat…
▽ More
Location information claimed by devices will play an ever-increasing role in future wireless networks such as 5G, the Internet of Things (IoT). Against this background, the verification of such claimed location information will be an issue of growing importance. A formal information-theoretic Location Verification System (LVS) can address this issue to some extent, but such a system usually operates within the limits of idealistic assumptions on a-priori information on the proportion of genuine users in the field. In this work we address this critical limitation by using a Neural Network (NN) showing how such a NN based LVS is capable of efficiently functioning even when the proportion of genuine users is completely unknown a-priori. We demonstrate the improved performance of this new form of LVS based on Time of Arrival measurements from multiple verifying base stations within the context of vehicular networks, quantifying how our NN-LVS outperforms the stand-alone information-theoretic LVS in a range of anticipated real-world conditions. We also show the efficient performance for the NN-LVS when the users' signals have added Non-Line-of-Site (NLoS) bias in them. This new LVS can be applied to a range of location-centric applications within the domain of the IoT.
△ Less
Submitted 2 May, 2019; v1 submitted 9 January, 2019;
originally announced January 2019.
-
Adaptive Gray World-Based Color Normalization of Thin Blood Film Images
Authors:
F. Boray Tek,
Andrew G. Dempster,
İzzet Kale
Abstract:
This paper presents an effective color normalization method for thin blood film images of peripheral blood specimens. Thin blood film images can easily be separated to foreground (cell) and background (plasma) parts. The color of the plasma region is used to estimate and reduce the differences arising from different illumination conditions. A second stage normalization based on the database-gray w…
▽ More
This paper presents an effective color normalization method for thin blood film images of peripheral blood specimens. Thin blood film images can easily be separated to foreground (cell) and background (plasma) parts. The color of the plasma region is used to estimate and reduce the differences arising from different illumination conditions. A second stage normalization based on the database-gray world algorithm transforms the color of the foreground objects to match a reference color character. The quantitative experiments demonstrate the effectiveness of the method and its advantages against two other general purpose color correction methods: simple gray world and Retinex.
△ Less
Submitted 14 July, 2016;
originally announced July 2016.