Search | arXiv e-print repository

SplitWise Regression: Stepwise Modeling with Adaptive Dummy Encoding

Authors: Marcell T. Kurbucz, Nikolaos Tzivanakis, Nilufer Sari Aslam, Adam M. Sykulski

Abstract: Capturing nonlinear relationships without sacrificing interpretability remains a persistent challenge in regression modeling. We introduce SplitWise, a novel framework that enhances stepwise regression. It adaptively transforms numeric predictors into threshold-based binary features using shallow decision trees, but only when such transformations improve model fit, as assessed by the Akaike Inform… ▽ More Capturing nonlinear relationships without sacrificing interpretability remains a persistent challenge in regression modeling. We introduce SplitWise, a novel framework that enhances stepwise regression. It adaptively transforms numeric predictors into threshold-based binary features using shallow decision trees, but only when such transformations improve model fit, as assessed by the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC). This approach preserves the transparency of linear models while flexibly capturing nonlinear effects. Implemented as a user-friendly R package, SplitWise is evaluated on both synthetic and real-world datasets. The results show that it consistently produces more parsimonious and generalizable models than traditional stepwise and penalized regression techniques. △ Less

Submitted 21 May, 2025; originally announced May 2025.

Comments: 15 pages, 1 figure, 3 tables

MSC Class: 62H20; 62J05; 68T05 ACM Class: G.3; I.2.6; I.5.1; I.5.2

arXiv:2412.01399 [pdf, ps, other]

Navigating Challenges in Spatio-temporal Modelling of Antarctic Krill Abundance: Addressing Zero-inflated Data and Misaligned Covariates

Authors: André Victor Ribeiro Amaral, Adam M. Sykulski, Sophie Fielding, Emma Cavan

Abstract: Antarctic krill (Euphausia superba) are among the most abundant species on our planet and serve as a vital food source for many marine predators in the Southern Ocean. In this paper, we utilise statistical spatio-temporal methods to combine data from various sources and resolutions, aiming to model krill abundance. Our focus lies in fitting the model to a dataset comprising acoustic measurements o… ▽ More Antarctic krill (Euphausia superba) are among the most abundant species on our planet and serve as a vital food source for many marine predators in the Southern Ocean. In this paper, we utilise statistical spatio-temporal methods to combine data from various sources and resolutions, aiming to model krill abundance. Our focus lies in fitting the model to a dataset comprising acoustic measurements of krill biomass. To achieve this, we integrate climate covariates obtained from satellite imagery and from drifting surface buoys (also known as drifters). Additionally, we use sparsely collected krill biomass data obtained from net fishing efforts (KRILLBASE) for validation. However, integrating these multiple heterogeneous data sources presents significant modelling challenges, including spatio-temporal misalignment and inflated zeros in the observed data. To address these challenges, we fit a Hurdle-Gamma model to jointly describe the occurrence of zeros and the krill biomass for the non-zero observations, while also accounting for misaligned and heterogeneous data sources, including drifters. Therefore, our work presents a comprehensive framework for analysing and predicting krill abundance in the Southern Ocean, leveraging information from various sources and formats. This is crucial due to the impact of krill fishing, as understanding their distribution is essential for informed management decisions and fishing regulations aimed at protecting the species. △ Less

Submitted 17 June, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

arXiv:2411.19633 [pdf, other]

doi 10.1016/j.spasta.2025.100898

Isotropy testing in spatial point patterns: nonparametric versus parametric replication under misspecification

Authors: Jakub J. Pypkowski, Adam M. Sykulski, James S. Martin

Abstract: Several hypothesis testing methods have been proposed to validate the assumption of isotropy in spatial point patterns. A majority of these methods are characterised by an unknown distribution of the test statistic under the null hypothesis of isotropy. Parametric approaches to approximating the distribution involve simulation of patterns from a user-specified isotropic model. Alternatively, nonpa… ▽ More Several hypothesis testing methods have been proposed to validate the assumption of isotropy in spatial point patterns. A majority of these methods are characterised by an unknown distribution of the test statistic under the null hypothesis of isotropy. Parametric approaches to approximating the distribution involve simulation of patterns from a user-specified isotropic model. Alternatively, nonparametric replicates of the test statistic under isotropy can be used to waive the need for specifying a model. In this paper, we first present a general framework which allows for the integration of a selected nonparametric replication method into isotropy testing. We then conduct a large simulation study comprising application-like scenarios to assess the performance of tests with different parametric and nonparametric replication methods. In particular, we explore distortions in test size and power caused by model misspecification, and demonstrate the advantages of nonparametric replication in such scenarios. △ Less

Submitted 8 April, 2025; v1 submitted 29 November, 2024; originally announced November 2024.

Comments: 24 pages, 13 figures, 3 tables

arXiv:2312.13643 [pdf, other]

Debiasing Welch's Method for Spectral Density Estimation

Authors: Lachlan C. Astfalck, Adam M. Sykulski, Edward J. Cripps

Abstract: Welch's method provides an estimator of the power spectral density that is statistically consistent. This is achieved by averaging over periodograms calculated from overlapping segments of a time series. For a finite length time series, while the variance of the estimator decreases as the number of segments increase, the magnitude of the estimator's bias increases: a bias-variance trade-off ensues… ▽ More Welch's method provides an estimator of the power spectral density that is statistically consistent. This is achieved by averaging over periodograms calculated from overlapping segments of a time series. For a finite length time series, while the variance of the estimator decreases as the number of segments increase, the magnitude of the estimator's bias increases: a bias-variance trade-off ensues when setting the segment number. We address this issue by providing a novel method for debiasing Welch's method which maintains the computational complexity and asymptotic consistency, and leads to improved finite-sample performance. Theoretical results are given for fourth-order stationary processes with finite fourth-order moments and absolutely convergent fourth-order cumulant function. The significant bias reduction is demonstrated with numerical simulation and an application to real-world data. Our estimator also permits irregular spacing over frequency and we demonstrate how this may be employed for signal compression and further variance reduction. Code accompanying this work is available in R and python. △ Less

Submitted 10 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Resubmitted to Biometrika

arXiv:2204.06112 [pdf, other]

Analysing and visualising bike-sharing demand with outliers

Authors: Nicola Rennie, Catherine Cleophas, Adam M. Sykulski, Florian Dost

Abstract: Bike-sharing is a popular component of sustainable urban mobility. It requires anticipatory planning, e.g. of station locations and inventory, to balance expected demand and capacity. However, external factors such as extreme weather or glitches in public transport, can cause demand to deviate from baseline levels. Identifying such outliers keeps historic data reliable and improves forecasts. In t… ▽ More Bike-sharing is a popular component of sustainable urban mobility. It requires anticipatory planning, e.g. of station locations and inventory, to balance expected demand and capacity. However, external factors such as extreme weather or glitches in public transport, can cause demand to deviate from baseline levels. Identifying such outliers keeps historic data reliable and improves forecasts. In this paper we show how outliers can be identified by clustering stations and applying a functional depth analysis. We apply our analysis techniques to the Washington D.C. Capital Bikeshare data set as the running example throughout the paper, but our methodology is general by design. Furthermore, we offer an array of meaningful visualisations to communicate findings and highlight patterns in demand. Last but not least, we formulate managerial recommendations on how to use both the demand forecast and the identified outliers in the bike-sharing planning process. △ Less

Submitted 30 January, 2023; v1 submitted 12 April, 2022; originally announced April 2022.

Comments: 32 pages

arXiv:2202.03773 [pdf, other]

doi 10.1093/jrsssc/qlad006

A multivariate pseudo-likelihood approach to estimating directional ocean wave models

Authors: Jake P. Grainger, Adam M. Sykulski, Kevin Ewans, Hans F. Hansen, Philip Jonathan

Abstract: Ocean buoy data in the form of high frequency multivariate time series are routinely recorded at many locations in the world's oceans. Such data can be used to characterise the ocean wavefield, which is important for numerous socio-economic and scientific reasons. This characterisation is typically achieved by modelling the frequency-direction spectrum, which decomposes spatiotemporal variability… ▽ More Ocean buoy data in the form of high frequency multivariate time series are routinely recorded at many locations in the world's oceans. Such data can be used to characterise the ocean wavefield, which is important for numerous socio-economic and scientific reasons. This characterisation is typically achieved by modelling the frequency-direction spectrum, which decomposes spatiotemporal variability by both frequency and direction. State-of-the-art methods for estimating the parameters of such models do not make use of the full spatiotemporal content of the buoy observations due to unnecessary assumptions and smoothing steps. We explain how the multivariate debiased Whittle likelihood can be used to jointly estimate all parameters of such frequency-direction spectra directly from the recorded time series. When applied to North Sea buoy data, debiased Whittle likelihood inference reveals smooth evolution of spectral parameters over time. We discuss challenging practical issues including model misspecification, and provide guidelines for future application of the method. △ Less

Submitted 8 February, 2022; originally announced February 2022.

arXiv:2106.03823 [pdf, other]

Multivariate Probabilistic Regression with Natural Gradient Boosting

Authors: Michael O'Malley, Adam M. Sykulski, Rick Lumpkin, Alejandro Schuler

Abstract: Many single-target regression problems require estimates of uncertainty along with the point predictions. Probabilistic regression algorithms are well-suited for these tasks. However, the options are much more limited when the prediction target is multivariate and a joint measure of uncertainty is required. For example, in predicting a 2D velocity vector a joint uncertainty would quantify the prob… ▽ More Many single-target regression problems require estimates of uncertainty along with the point predictions. Probabilistic regression algorithms are well-suited for these tasks. However, the options are much more limited when the prediction target is multivariate and a joint measure of uncertainty is required. For example, in predicting a 2D velocity vector a joint uncertainty would quantify the probability of any vector in the plane, which would be more expressive than two separate uncertainties on the x- and y- components. To enable joint probabilistic regression, we propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution. Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches. We demonstrate these claims in simulation and with a case study predicting two-dimensional oceanographic velocity data. An implementation of our method is available at https://github.com/stanfordmlgroup/ngboost. △ Less

Submitted 7 June, 2021; originally announced June 2021.

arXiv:2104.04157 [pdf, other]

Outlier detection in network revenue management

Authors: Nicola Rennie, Catherine Cleophas, Adam M. Sykulski, Florian Dost

Abstract: This paper presents an automated approach for providing ranked lists of outliers in observed demand to support analysts in network revenue management. Such network revenue management, e.g. for railway itineraries, needs accurate demand forecasts. However, demand outliers across or in parts of a network complicate accurate demand forecasting, and the network structure makes such demand outliers har… ▽ More This paper presents an automated approach for providing ranked lists of outliers in observed demand to support analysts in network revenue management. Such network revenue management, e.g. for railway itineraries, needs accurate demand forecasts. However, demand outliers across or in parts of a network complicate accurate demand forecasting, and the network structure makes such demand outliers hard to detect. We propose a two-step approach combining clustering with functional outlier detection to identify outlying demand from network bookings observed on the leg level. The first step clusters legs to appropriately partition and pool booking patterns. The second step identifies outliers within each cluster and uses a novel aggregation method across legs to create a ranked alert list of affected instances. Our method outperforms analyses that consider leg data without regard for network implications and offers a computationally efficient alternative to storing and analysing all data on the itinerary level, especially in highly-connected networks where most customers book multi-leg products. A simulation study demonstrates the robustness of the approach and quantifies the potential revenue benefits from adjusting demand forecasts for offer optimisation. Finally, we illustrate the applicability based on empirical data obtained from Deutsche Bahn. △ Less

Submitted 24 February, 2023; v1 submitted 9 April, 2021; originally announced April 2021.

Comments: 79 pages, re-structured and additional computational results

arXiv:2012.00789 [pdf, ps, other]

Separating Mesoscale and Submesoscale Flows from Clustered Drifter Trajectories

Authors: Sarah Oscroft, Adam M. Sykulski, Jeffrey J. Early

Abstract: Drifters deployed in close proximity collectively provide a unique observational data set with which to separate mesoscale and submesoscale flows. In this paper we provide a principled approach for doing so by fitting observed velocities to a local Taylor expansion of the velocity flow field. We demonstrate how to estimate mesoscale and submesoscale quantities that evolve slowly over time, as well… ▽ More Drifters deployed in close proximity collectively provide a unique observational data set with which to separate mesoscale and submesoscale flows. In this paper we provide a principled approach for doing so by fitting observed velocities to a local Taylor expansion of the velocity flow field. We demonstrate how to estimate mesoscale and submesoscale quantities that evolve slowly over time, as well as their associated statistical uncertainty. We show that in practice the mesoscale component of our model can explain much first and second-moment variability in drifter velocities, especially at low frequencies. This results in much lower and more meaningful measures of submesoscale diffusivity, which would otherwise be contaminated by unresolved mesoscale flow. We quantify these effects theoretically via computing Lagrangian frequency spectra, and demonstrate the usefulness of our methodology through simulations as well as with real observations from the LatMix deployment of drifters. The outcome of this method is a full Lagrangian decomposition of each drifter trajectory into three components that represent the background, mesoscale, and submesoscale flow. △ Less

Submitted 25 December, 2020; v1 submitted 1 December, 2020; originally announced December 2020.

Comments: Accepted in Fluids

arXiv:2008.10437 [pdf, other]

Estimating the parameters of ocean wave spectra

Authors: Jake P. Grainger, Adam M. Sykulski, Philip Jonathan, Kevin Ewans

Abstract: Wind-generated waves are often treated as stochastic processes. There is particular interest in their spectral density functions, which are often expressed in some parametric form. Such spectral density functions are used as inputs when modelling structural response or other engineering concerns. Therefore, accurate and precise recovery of the parameters of such a form, from observed wave records,… ▽ More Wind-generated waves are often treated as stochastic processes. There is particular interest in their spectral density functions, which are often expressed in some parametric form. Such spectral density functions are used as inputs when modelling structural response or other engineering concerns. Therefore, accurate and precise recovery of the parameters of such a form, from observed wave records, is important. Current techniques are known to struggle with recovering certain parameters, especially the peak enhancement factor and spectral tail decay. We introduce an approach from the statistical literature, known as the de-biased Whittle likelihood, and address some practical concerns regarding its implementation in the context of wind-generated waves. We demonstrate, through numerical simulation, that the de-biased Whittle likelihood outperforms current techniques, such as least squares fitting, both in terms of accuracy and precision of the recovered parameters. We also provide a method for estimating the uncertainty of parameter estimates. We perform an example analysis on a data-set recorded off the coast of New Zealand, to illustrate some of the extra practical concerns that arise when estimating the parameters of spectra from observed data. △ Less

Submitted 25 March, 2021; v1 submitted 24 August, 2020; originally announced August 2020.

arXiv:2005.01171 [pdf, other]

doi 10.1371/journal.pone.0239368

Nonparametric Time Series Summary Statistics for High-Frequency Accelerometry Data from Individuals with Advanced Dementia

Authors: Keerati Suibkitwanchai, Adam M. Sykulski, Guillermo Perez Algorta, Daniel Waller, Catherine Walshe

Abstract: Accelerometry data has been widely used to measure activity and the circadian rhythm of individuals across the health sciences, in particular with people with advanced dementia. Modern accelerometers can record continuous observations on a single individual for several days at a sampling frequency of the order of one hertz. Such rich and lengthy data sets provide new opportunities for statistical… ▽ More Accelerometry data has been widely used to measure activity and the circadian rhythm of individuals across the health sciences, in particular with people with advanced dementia. Modern accelerometers can record continuous observations on a single individual for several days at a sampling frequency of the order of one hertz. Such rich and lengthy data sets provide new opportunities for statistical insight, but also pose challenges in selecting from a wide range of possible summary statistics, and how the calculation of such statistics should be optimally tuned and implemented. In this paper, we build on existing approaches, as well as propose new summary statistics, and detail how these should be implemented with high frequency accelerometry data. We test and validate our methods on an observed data set from 26 recordings from individuals with advanced dementia and 14 recordings from individuals without dementia. We study four metrics: Interdaily stability (IS), intradaily variability (IV), the scaling exponent from detrended fluctuation analysis (DFA), and a novel nonparametric estimator which we call the proportion of variance (PoV), which calculates the strength of the circadian rhythm using spectral density estimation. We perform a detailed analysis indicating how the time series should be optimally subsampled to calculate IV, and recommend a subsampling rate of approximately 5 minutes for the dataset that has been studied. In addition, we propose the use of the DFA scaling exponent separately for daytime and nighttime, to further separate effects between individuals. We compare the relationships between all these methods and show that they effectively capture different features of the time series. △ Less

Submitted 29 September, 2020; v1 submitted 3 May, 2020; originally announced May 2020.

Journal ref: PLoS ONE 15(9): e0239368 (2020)

arXiv:2002.07774 [pdf, other]

doi 10.1175/JTECH-D-20-0134.1

Estimating the travel time and the most likely path from Lagrangian drifters

Authors: Michael O'Malley, Adam M. Sykulski, Romuald Laso-Jadart, Mohammed-Amin Madoui

Abstract: We provide a novel methodology for computing the most likely path taken by drifters between arbitrary fixed locations in the ocean. We also provide an estimate of the travel time associated with this path. Lagrangian pathways and travel times are of practical value not just in understanding surface velocities, but also in modelling the transport of ocean-borne species such as planktonic organisms,… ▽ More We provide a novel methodology for computing the most likely path taken by drifters between arbitrary fixed locations in the ocean. We also provide an estimate of the travel time associated with this path. Lagrangian pathways and travel times are of practical value not just in understanding surface velocities, but also in modelling the transport of ocean-borne species such as planktonic organisms, and floating debris such as plastics. In particular, the estimated travel time can be used to compute an estimated Lagrangian distance, which is often more informative than Euclidean distance in understanding connectivity between locations. Our methodology is purely data-driven, and requires no simulations of drifter trajectories, in contrast to existing approaches. Our method scales globally and can simultaneously handle multiple locations in the ocean. Furthermore, we provide estimates of the error and uncertainty associated with both the most likely path and the associated travel time. △ Less

Submitted 18 March, 2021; v1 submitted 18 February, 2020; originally announced February 2020.

Comments: 27 pages, 10 figures in the main text. 13 pages, 8 figures in the supplemental material

arXiv:2001.05965 [pdf, ps, other]

The Elliptical Ornstein-Uhlenbeck Process

Authors: Adam M. Sykulski, Sofia C. Olhede, Hanna M. Sykulska-Lawrence

Abstract: We introduce the elliptical Ornstein-Uhlenbeck (OU) process, which is a generalisation of the well-known univariate OU process to bivariate time series. This process maps out elliptical stochastic oscillations over time in the complex plane, which are observed in many applications of coupled bivariate time series. The appeal of the model is that elliptical oscillations are generated using one simp… ▽ More We introduce the elliptical Ornstein-Uhlenbeck (OU) process, which is a generalisation of the well-known univariate OU process to bivariate time series. This process maps out elliptical stochastic oscillations over time in the complex plane, which are observed in many applications of coupled bivariate time series. The appeal of the model is that elliptical oscillations are generated using one simple first order stochastic differential equation (SDE), whereas alternative models require more complicated vectorised or higher order SDE representations. The second useful feature is that parameter estimation can be performed semi-parametrically in the frequency domain using the Whittle Likelihood. We determine properties of the model including the conditions for stationarity, and the geometrical structure of the elliptical oscillations. We demonstrate the utility of the model by measuring periodic and elliptical properties of Earth's polar motion. △ Less

Submitted 7 December, 2021; v1 submitted 16 January, 2020; originally announced January 2020.

Comments: To appear in Statistics and Its Interface

arXiv:1912.05974 [pdf, ps, other]

doi 10.1016/j.ejor.2021.01.002

Identifying and Responding to Outlier Demand in Revenue Management

Authors: Nicola Rennie, Catherine Cleophas, Adam M. Sykulski, Florian Dost

Abstract: Revenue management strongly relies on accurate forecasts. Thus, when extraordinary events cause outlier demand, revenue management systems need to recognise this and adapt both forecast and controls. Many passenger transport service providers, such as railways and airlines, control the sale of tickets through revenue management. State-of-the-art systems in these industries rely on analyst expertis… ▽ More Revenue management strongly relies on accurate forecasts. Thus, when extraordinary events cause outlier demand, revenue management systems need to recognise this and adapt both forecast and controls. Many passenger transport service providers, such as railways and airlines, control the sale of tickets through revenue management. State-of-the-art systems in these industries rely on analyst expertise to identify outlier demand both online (within the booking horizon) and offline (in hindsight). So far, little research focuses on automating and evaluating the detection of outlier demand in this context. To remedy this, we propose a novel approach, which detects outliers using functional data analysis in combination with time series extrapolation. We evaluate the approach in a simulation framework, which generates outliers by varying the demand model. The results show that functional outlier detection yields better detection rates than alternative approaches for both online and offline analyses. Depending on the category of outliers, extrapolation further increases online detection performance. We also apply the procedure to a set of empirical data to demonstrate its practical implications. By evaluating the full feedback-driven system of forecast and optimisation, we generate insight on the asymmetric effects of positive and negative demand outliers. We show that identifying instances of outlier demand and adjusting the forecast in a timely fashion substantially increases revenue compared to what is earned when ignoring outliers. △ Less

Submitted 5 October, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

arXiv:1907.02447 [pdf, other]

The Debiased Spatial Whittle Likelihood

Authors: Arthur P. Guillaumin, Adam M. Sykulski, Sofia C. Olhede, Frederik J. Simons

Abstract: We provide a computationally and statistically efficient method for estimating the parameters of a stochastic covariance model observed on a regular spatial grid in any number of dimensions. Our proposed method, which we call the Debiased Spatial Whittle likelihood, makes important corrections to the well-known Whittle likelihood to account for large sources of bias caused by boundary effects and… ▽ More We provide a computationally and statistically efficient method for estimating the parameters of a stochastic covariance model observed on a regular spatial grid in any number of dimensions. Our proposed method, which we call the Debiased Spatial Whittle likelihood, makes important corrections to the well-known Whittle likelihood to account for large sources of bias caused by boundary effects and aliasing. We generalise the approach to flexibly allow for significant volumes of missing data including those with lower-dimensional substructure, and for irregular sampling boundaries. We build a theoretical framework under relatively weak assumptions which ensures consistency and asymptotic normality in numerous practical settings including missing data and non-Gaussian processes. We also extend our consistency results to multivariate processes. We provide detailed implementation guidelines which ensure the estimation procedure can be conducted in O(n log n) operations, where n is the number of points of the encapsulating rectangular grid, thus keeping the computational scalability of Fourier and Whittle-based methods for large data sets. We validate our procedure over a range of simulated and real-world settings, and compare with state-of-the-art alternatives, demonstrating the enduring practical appeal of Fourier-based methods, provided they are corrected by the procedures developed in this paper. △ Less

Submitted 26 April, 2022; v1 submitted 4 July, 2019; originally announced July 2019.

arXiv:1904.12064 [pdf, ps, other]

doi 10.1175/JTECH-D-19-0087.1

Smoothing and Interpolating Noisy GPS Data with Smoothing Splines

Authors: Jeffrey J. Early, Adam M. Sykulski

Abstract: A comprehensive methodology is provided for smoothing noisy, irregularly sampled data with non-Gaussian noise using smoothing splines. We demonstrate how the spline order and tension parameter can be chosen a priori from physical reasoning. We also show how to allow for non-Gaussian noise and outliers which are typical in GPS signals. We demonstrate the effectiveness of our methods on GPS trajecto… ▽ More A comprehensive methodology is provided for smoothing noisy, irregularly sampled data with non-Gaussian noise using smoothing splines. We demonstrate how the spline order and tension parameter can be chosen a priori from physical reasoning. We also show how to allow for non-Gaussian noise and outliers which are typical in GPS signals. We demonstrate the effectiveness of our methods on GPS trajectory data obtained from oceanographic floating instruments known as drifters. △ Less

Submitted 26 June, 2019; v1 submitted 26 April, 2019; originally announced April 2019.

Comments: 16 pages, 8 figures

arXiv:1605.09107 [pdf, ps, other]

Analysis of nonstationary modulated time series with applications to oceanographic flow measurements

Authors: Arthur P. Guillaumin, Adam M. Sykulski, Sofia C. Olhede, Jeffrey J. Early, Jonathan M. Lilly

Abstract: We propose a new class of univariate nonstationary time series models, using the framework of modulated time series, which is appropriate for the analysis of rapidly-evolving time series as well as time series observations with missing data. We extend our techniques to a class of bivariate time series that are isotropic. Exact inference is often not computationally viable for time series analysis,… ▽ More We propose a new class of univariate nonstationary time series models, using the framework of modulated time series, which is appropriate for the analysis of rapidly-evolving time series as well as time series observations with missing data. We extend our techniques to a class of bivariate time series that are isotropic. Exact inference is often not computationally viable for time series analysis, and so we propose an estimation method based on the Whittle-likelihood, a commonly adopted pseudo-likelihood. Our inference procedure is shown to be consistent under standard assumptions, as well as having considerably lower computational cost than exact likelihood in general. We show the utility of this framework for the analysis of drifting instruments, an analysis that is key to characterising global ocean circulation and therefore also for decadal to century-scale climate understanding. △ Less

Submitted 24 January, 2017; v1 submitted 30 May, 2016; originally announced May 2016.

Comments: 31 pages, 5 figures, 3 tables

arXiv:1605.06718 [pdf, ps, other]

The De-Biased Whittle Likelihood

Authors: Adam M. Sykulski, Sofia C. Olhede, Arthur P. Guillaumin, Jonathan M. Lilly, Jeffrey J. Early

Abstract: The Whittle likelihood is a widely used and computationally efficient pseudo-likelihood. However, it is known to produce biased parameter estimates for large classes of models. We propose a method for de-biasing Whittle estimates for second-order stationary stochastic processes. The de-biased Whittle likelihood can be computed in the same $\mathcal{O}(n\log n)$ operations as the standard approach.… ▽ More The Whittle likelihood is a widely used and computationally efficient pseudo-likelihood. However, it is known to produce biased parameter estimates for large classes of models. We propose a method for de-biasing Whittle estimates for second-order stationary stochastic processes. The de-biased Whittle likelihood can be computed in the same $\mathcal{O}(n\log n)$ operations as the standard approach. We demonstrate the superior performance of the method in simulation studies and in application to a large-scale oceanographic dataset, where in both cases the de-biased approach reduces bias by up to two orders of magnitude, achieving estimates that are close to exact maximum likelihood, at a fraction of the computational cost. We prove that the method yields estimates that are consistent at an optimal convergence rate of $n^{-1/2}$, under weaker assumptions than standard theory, where we do not require that the power spectral density is continuous in frequency. We describe how the method can be easily combined with standard methods of bias reduction, such as tapering and differencing, to further reduce bias in parameter estimates. △ Less

Submitted 12 September, 2018; v1 submitted 21 May, 2016; originally announced May 2016.

Comments: To appear shortly in Biometrika. Full published version includes extensions of theory to non-Gaussian processes, and new simulation examples with an AR(4) and non-Gaussian process

arXiv:1605.05278 [pdf, ps, other]

doi 10.1109/MLSP.2016.7738840

Exact Simulation of Noncircular or Improper Complex-Valued Stationary Gaussian Processes using Circulant Embedding

Authors: Adam M. Sykulski, Donald B. Percival

Abstract: This paper provides an algorithm for simulating improper (or noncircular) complex-valued stationary Gaussian processes. The technique utilizes recently developed methods for multivariate Gaussian processes from the circulant embedding literature. The method can be performed in $\mathcal{O}(n\log_2 n)$ operations, where $n$ is the length of the desired sequence. The method is exact, except when eig… ▽ More This paper provides an algorithm for simulating improper (or noncircular) complex-valued stationary Gaussian processes. The technique utilizes recently developed methods for multivariate Gaussian processes from the circulant embedding literature. The method can be performed in $\mathcal{O}(n\log_2 n)$ operations, where $n$ is the length of the desired sequence. The method is exact, except when eigenvalues of prescribed circulant matrices are negative. We evaluate the performance of the algorithm empirically, and provide a practical example where the method is guaranteed to be exact for all $n$, with an improper fractional Gaussian noise process. △ Less

Submitted 15 March, 2017; v1 submitted 17 May, 2016; originally announced May 2016.

Comments: Link to published version: http://ieeexplore.ieee.org/document/7738840/

Journal ref: 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)

arXiv:1605.01684 [pdf, other]

doi 10.5194/npg-24-481-2017

Fractional Brownian motion, the Matern process, and stochastic modeling of turbulent dispersion

Authors: J. M. Lilly, A. M. Sykulski, J. J Early, S. C. Olhede

Abstract: Stochastic process exhibiting power-law slopes in the frequency domain are frequently well modeled by fractional Brownian motion (fBm). In particular, the spectral slope at high frequencies is associated with the degree of small-scale roughness or fractal dimension. However, a broad class of real-world signals have a high-frequency slope, like fBm, but a plateau in the vicinity of zero frequency.… ▽ More Stochastic process exhibiting power-law slopes in the frequency domain are frequently well modeled by fractional Brownian motion (fBm). In particular, the spectral slope at high frequencies is associated with the degree of small-scale roughness or fractal dimension. However, a broad class of real-world signals have a high-frequency slope, like fBm, but a plateau in the vicinity of zero frequency. This low-frequency plateau, it is shown, implies that the temporal integral of the process exhibits diffusive behavior, dispersing from its initial location at a constant rate. Such processes are not well modeled by fBm, which has a singularity at zero frequency corresponding to an unbounded rate of dispersion. A more appropriate stochastic model is a much lesser-known random process called the Matern process, which is shown herein to be a damped version of fractional Brownian motion. This article first provides a thorough introduction to fractional Brownian motion, then examines the details of the Matern process and its relationship to fBm. An algorithm for the simulation of the Matern process in O(N log N) operations is given. Unlike fBm, the Matern process is found to provide an excellent match to modeling velocities from particle trajectories in an application to two-dimensional fluid turbulence. △ Less

Submitted 2 September, 2017; v1 submitted 5 May, 2016; originally announced May 2016.

Journal ref: Nonlinear Processes in Geophysics, 24: 481-514 (2017)

arXiv:1511.04128 [pdf, ps, other]

doi 10.1109/TSP.2016.2599503

A Widely Linear Complex Autoregressive Process of Order One

Authors: Adam M. Sykulski, Sofia C. Olhede, Jonathan M. Lilly

Abstract: We propose a simple stochastic process for modeling improper or noncircular complex-valued signals. The process is a natural extension of a complex-valued autoregressive process, extended to include a widely linear autoregressive term. This process can then capture elliptical, as opposed to circular, stochastic oscillations in a bivariate signal. The process is order one and is more parsimonious t… ▽ More We propose a simple stochastic process for modeling improper or noncircular complex-valued signals. The process is a natural extension of a complex-valued autoregressive process, extended to include a widely linear autoregressive term. This process can then capture elliptical, as opposed to circular, stochastic oscillations in a bivariate signal. The process is order one and is more parsimonious than alternative stochastic modeling approaches in the literature. We provide conditions for stationarity, and derive the form of the covariance and relation sequence of this model. We describe how parameter estimation can be efficiently performed both in the time and frequency domain. We demonstrate the practical utility of the process in capturing elliptical oscillations that are naturally present in seismic signals. △ Less

Submitted 15 March, 2017; v1 submitted 12 November, 2015; originally announced November 2015.

Comments: Link to published version: http://ieeexplore.ieee.org/abstract/document/7539658/

Journal ref: IEEE Transactions on Signal Processing, 64(23), 6200-6210, 2016

arXiv:1508.05593 [pdf, other]

A Power Variance Test for Nonstationarity in Complex-Valued Signals

Authors: Thomas E. Bartlett, Adam M. Sykulski, Sofia C. Olhede, Jonathan M. Lilly, Jeffrey J. Early

Abstract: We propose a novel algorithm for testing the hypothesis of nonstationarity in complex-valued signals. The implementation uses both the bootstrap and the Fast Fourier Transform such that the algorithm can be efficiently implemented in O(NlogN) time, where N is the length of the observed signal. The test procedure examines the second-order structure and contrasts the observed power variance - i.e. t… ▽ More We propose a novel algorithm for testing the hypothesis of nonstationarity in complex-valued signals. The implementation uses both the bootstrap and the Fast Fourier Transform such that the algorithm can be efficiently implemented in O(NlogN) time, where N is the length of the observed signal. The test procedure examines the second-order structure and contrasts the observed power variance - i.e. the variability of the instantaneous variance over time - with the expected characteristics of stationary signals generated via the bootstrap method. Our algorithmic procedure is capable of learning different types of nonstationarity, such as jumps or strong sinusoidal components. We illustrate the utility of our test and algorithm through application to turbulent flow data from fluid dynamics. △ Less

Submitted 7 October, 2015; v1 submitted 23 August, 2015; originally announced August 2015.

arXiv:1312.2923 [pdf, other]

doi 10.1111/rssc.12112

Lagrangian Time Series Models for Ocean Surface Drifter Trajectories

Authors: Adam M. Sykulski, Sofia C. Olhede, Jonathan M. Lilly, Eric Danioux

Abstract: This paper proposes stochastic models for the analysis of ocean surface trajectories obtained from freely-drifting satellite-tracked instruments. The proposed time series models are used to summarise large multivariate datasets and infer important physical parameters of inertial oscillations and other ocean processes. Nonstationary time series methods are employed to account for the spatiotemporal… ▽ More This paper proposes stochastic models for the analysis of ocean surface trajectories obtained from freely-drifting satellite-tracked instruments. The proposed time series models are used to summarise large multivariate datasets and infer important physical parameters of inertial oscillations and other ocean processes. Nonstationary time series methods are employed to account for the spatiotemporal variability of each trajectory. Because the datasets are large, we construct computationally efficient methods through the use of frequency-domain modelling and estimation, with the data expressed as complex-valued time series. We detail how practical issues related to sampling and model misspecification may be addressed using semi-parametric techniques for time series, and we demonstrate the effectiveness of our stochastic models through application to both real-world data and to numerical model output. △ Less

Submitted 21 April, 2015; v1 submitted 10 December, 2013; originally announced December 2013.

Comments: 21 pages, 10 figures

Journal ref: Journal of the Royal Statistical Society (Series C, Applied Statistics), 65(1), 29-50, 2016

arXiv:1306.5993 [pdf, ps, other]

Frequency-Domain Stochastic Modeling of Stationary Bivariate or Complex-Valued Signals

Authors: Adam M. Sykulski, Sofia C. Olhede, Jonathan M. Lilly, Jeffrey J. Early

Abstract: There are three equivalent ways of representing two jointly observed real-valued signals: as a bivariate vector signal, as a single complex-valued signal, or as two analytic signals known as the rotary components. Each representation has unique advantages depending on the system of interest and the application goals. In this paper we provide a joint framework for all three representations in the c… ▽ More There are three equivalent ways of representing two jointly observed real-valued signals: as a bivariate vector signal, as a single complex-valued signal, or as two analytic signals known as the rotary components. Each representation has unique advantages depending on the system of interest and the application goals. In this paper we provide a joint framework for all three representations in the context of frequency-domain stochastic modeling. This framework allows us to extend many established statistical procedures for bivariate vector time series to complex-valued and rotary representations. These include procedures for parametrically modeling signal coherence, estimating model parameters using the Whittle likelihood, performing semi-parametric modeling, and choosing between classes of nested models using model choice. We also provide a new method of testing for impropriety in complex-valued signals, which tests for noncircular or anisotropic second-order statistical structure when the signal is represented in the complex plane. Finally, we demonstrate the usefulness of our methodology in capturing the anisotropic structure of signals observed from fluid dynamic simulations of turbulence. △ Less

Submitted 15 March, 2017; v1 submitted 25 June, 2013; originally announced June 2013.

Comments: To appear in IEEE Transactions on Signal Processing

Journal ref: IEEE Transactions on Signal Processing, 2017

Showing 1–24 of 24 results for author: Sykulski, A M