-
Warping and Matching Subsequences Between Time Series
Authors:
Simiao Lin,
Wannes Meert,
Pieter Robberechts,
Hendrik Blockeel
Abstract:
Comparing time series is essential in various tasks such as clustering and classification. While elastic distance measures that allow warping provide a robust quantitative comparison, a qualitative comparison on top of them is missing. Traditional visualizations focus on point-to-point alignment and do not convey the broader structural relationships at the level of subsequences. This limitation ma…
▽ More
Comparing time series is essential in various tasks such as clustering and classification. While elastic distance measures that allow warping provide a robust quantitative comparison, a qualitative comparison on top of them is missing. Traditional visualizations focus on point-to-point alignment and do not convey the broader structural relationships at the level of subsequences. This limitation makes it difficult to understand how and where one time series shifts, speeds up or slows down with respect to another. To address this, we propose a novel technique that simplifies the warping path to highlight, quantify and visualize key transformations (shift, compression, difference in amplitude). By offering a clearer representation of how subsequences match between time series, our method enhances interpretability in time series comparison.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Biases in Expected Goals Models Confound Finishing Ability
Authors:
Jesse Davis,
Pieter Robberechts
Abstract:
Expected Goals (xG) has emerged as a popular tool for evaluating finishing skill in soccer analytics. It involves comparing a player's cumulative xG with their actual goal output, where consistent overperformance indicates strong finishing ability. However, the assessment of finishing skill in soccer using xG remains contentious due to players' difficulty in consistently outperforming their cumula…
▽ More
Expected Goals (xG) has emerged as a popular tool for evaluating finishing skill in soccer analytics. It involves comparing a player's cumulative xG with their actual goal output, where consistent overperformance indicates strong finishing ability. However, the assessment of finishing skill in soccer using xG remains contentious due to players' difficulty in consistently outperforming their cumulative xG. In this paper, we aim to address the limitations and nuances surrounding the evaluation of finishing skill using xG statistics. Specifically, we explore three hypotheses: (1) the deviation between actual and expected goals is an inadequate metric due to the high variance of shot outcomes and limited sample sizes, (2) the inclusion of all shots in cumulative xG calculation may be inappropriate, and (3) xG models contain biases arising from interdependencies in the data that affect skill measurement. We found that sustained overperformance of cumulative xG requires both high shot volumes and exceptional finishing, including all shot types can obscure the finishing ability of proficient strikers, and that there is a persistent bias that makes the actual and expected goals closer for excellent finishers than it really is. Overall, our analysis indicates that we need more nuanced quantitative approaches for investigating a player's finishing ability, which we achieved using a technique from AI fairness to learn an xG model that is calibrated for multiple subgroups of players. As a concrete use case, we show that (1) the standard biased xG model underestimates Messi's GAX by 17% and (2) Messi's GAX is 27% higher than the typical elite high-shot-volume attacker, indicating that Messi is even a more exceptional finisher than people commonly believed.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Elastic Product Quantization for Time Series
Authors:
Pieter Robberechts,
Wannes Meert,
Jesse Davis
Abstract:
Analyzing numerous or long time series is difficult in practice due to the high storage costs and computational requirements. Therefore, techniques have been proposed to generate compact similarity-preserving representations of time series, enabling real-time similarity search on large in-memory data collections. However, the existing techniques are not ideally suited for assessing similarity when…
▽ More
Analyzing numerous or long time series is difficult in practice due to the high storage costs and computational requirements. Therefore, techniques have been proposed to generate compact similarity-preserving representations of time series, enabling real-time similarity search on large in-memory data collections. However, the existing techniques are not ideally suited for assessing similarity when sequences are locally out of phase. In this paper, we propose the use of product quantization for efficient similarity-based comparison of time series under time warping. The idea is to first compress the data by partitioning the time series into equal length sub-sequences which are represented by a short code. The distance between two time series can then be efficiently approximated by pre-computed elastic distances between their codes. The partitioning into sub-sequences forces unwanted alignments, which we address with a pre-alignment step using the maximal overlap discrete wavelet transform (MODWT). To demonstrate the efficiency and accuracy of our method, we perform an extensive experimental evaluation on benchmark datasets in nearest neighbors classification and clustering applications. Overall, the proposed solution emerges as a highly efficient (both in terms of memory usage and computation time) replacement for elastic measures in time series applications.
△ Less
Submitted 26 August, 2022; v1 submitted 4 January, 2022;
originally announced January 2022.
-
Leaving Goals on the Pitch: Evaluating Decision Making in Soccer
Authors:
Maaike Van Roy,
Pieter Robberechts,
Wen-Chi Yang,
Luc De Raedt,
Jesse Davis
Abstract:
Analysis of the popular expected goals (xG) metric in soccer has determined that a (slightly) smaller number of high-quality attempts will likely yield more goals than a slew of low-quality ones. This observation has driven a change in shooting behavior. Teams are passing up on shots from outside the penalty box, in the hopes of generating a better shot closer to goal later on. This paper evaluate…
▽ More
Analysis of the popular expected goals (xG) metric in soccer has determined that a (slightly) smaller number of high-quality attempts will likely yield more goals than a slew of low-quality ones. This observation has driven a change in shooting behavior. Teams are passing up on shots from outside the penalty box, in the hopes of generating a better shot closer to goal later on. This paper evaluates whether this decrease in long-distance shots is warranted. Therefore, we propose a novel generic framework to reason about decision-making in soccer by combining techniques from machine learning and artificial intelligence (AI). First, we model how a team has behaved offensively over the course of two seasons by learning a Markov Decision Process (MDP) from event stream data. Second, we use reasoning techniques arising from the AI literature on verification to each team's MDP. This allows us to reason about the efficacy of certain potential decisions by posing counterfactual questions to the MDP. Our key conclusion is that teams would score more goals if they shot more often from outside the penalty box in a small number of team-specific locations. The proposed framework can easily be extended and applied to analyze other aspects of the game.
△ Less
Submitted 16 February, 2023; v1 submitted 7 April, 2021;
originally announced April 2021.
-
Predicting gait events from tibial acceleration in rearfoot running: a structured machine learning approach
Authors:
Pieter Robberechts,
Rud Derie,
Pieter Van den Berghe,
Joeri Gerlo,
Dirk De Clercq,
Veerle Segers,
Jesse Davis
Abstract:
Gait event detection of the initial contact and toe off is essential for running gait analysis, allowing the derivation of parameters such as stance time. Heuristic-based methods exist to estimate these key gait events from tibial accelerometry. However, these methods are tailored to very specific acceleration profiles, which may offer complications when dealing with larger data sets and inherent…
▽ More
Gait event detection of the initial contact and toe off is essential for running gait analysis, allowing the derivation of parameters such as stance time. Heuristic-based methods exist to estimate these key gait events from tibial accelerometry. However, these methods are tailored to very specific acceleration profiles, which may offer complications when dealing with larger data sets and inherent biological variability. Therefore, this paper investigates whether a structured machine learning approach can achieve a more accurate prediction of running gait event timings from tibial accelerometry. Force-based event detection acted as the criterion measure in order to assess the accuracy, repeatability and sensitivity of the predicted gait events. A heuristic method and two structured machine learning methods were employed to derive initial contact, toe off and stance time from tibial acceleration signals. Both a structured perceptron model (median absolute error of stance time estimation: 10.00 $\pm$ 8.73 ms) and a structured recurrent neural network model (median absolute error of stance time estimation: 6.50 $\pm$ 5.74 ms) significantly outperformed the existing heuristic approach (median absolute error of stance time estimation: 11.25 $\pm$ 9.52 ms) on data from 93 rearfoot runners. Thus, results indicate that a structured recurrent neural network machine learning model offers the most accurate and consistent estimation of the gait events and its derived stance time during level overground running. The machine learning methods seem less affected by intra- and inter-subject variation within the data, allowing for accurate and efficient automated data output during rearfoot overground running. Furthermore offering possibilities for real-time monitoring and biofeedback during prolonged measurements, even outside the laboratory.
△ Less
Submitted 14 December, 2020; v1 submitted 29 October, 2019;
originally announced October 2019.
-
A Bayesian Approach to In-Game Win Probability in Soccer
Authors:
Pieter Robberechts,
Jan Van Haaren,
Jesse Davis
Abstract:
In-game win probability models, which provide a sports team's likelihood of winning at each point in a game based on historical observations, are becoming increasingly popular. In baseball, basketball and American football, they have become important tools to enhance fan experience, to evaluate in-game decision-making, and to inform coaching decisions. While equally relevant in soccer, the adoptio…
▽ More
In-game win probability models, which provide a sports team's likelihood of winning at each point in a game based on historical observations, are becoming increasingly popular. In baseball, basketball and American football, they have become important tools to enhance fan experience, to evaluate in-game decision-making, and to inform coaching decisions. While equally relevant in soccer, the adoption of these models is held back by technical challenges arising from the low-scoring nature of the sport.
In this paper, we introduce an in-game win probability model for soccer that addresses the shortcomings of existing models. First, we demonstrate that in-game win probability models for other sports struggle to provide accurate estimates for soccer, especially towards the end of a game. Second, we introduce a novel Bayesian statistical framework that estimates running win, tie and loss probabilities by leveraging a set of contextual game state features. An empirical evaluation on eight seasons of data for the top-five soccer leagues demonstrates that our framework provides well-calibrated probabilities. Furthermore, two use cases show its ability to enhance fan experience and to evaluate performance in crucial game situations.
△ Less
Submitted 13 August, 2021; v1 submitted 12 June, 2019;
originally announced June 2019.
-
Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data
Authors:
Jessa Bekker,
Pieter Robberechts,
Jesse Davis
Abstract:
Most positive and unlabeled data is subject to selection biases. The labeled examples can, for example, be selected from the positive set because they are easier to obtain or more obviously positive. This paper investigates how learning can be ena BHbled in this setting. We propose and theoretically analyze an empirical-risk-based method for incorporating the labeling mechanism. Additionally, we i…
▽ More
Most positive and unlabeled data is subject to selection biases. The labeled examples can, for example, be selected from the positive set because they are easier to obtain or more obviously positive. This paper investigates how learning can be ena BHbled in this setting. We propose and theoretically analyze an empirical-risk-based method for incorporating the labeling mechanism. Additionally, we investigate under which assumptions learning is possible when the labeling mechanism is not fully understood and propose a practical method to enable this. Our empirical analysis supports the theoretical results and shows that taking into account the possibility of a selection bias, even when the labeling mechanism is unknown, improves the trained classifiers.
△ Less
Submitted 28 June, 2019; v1 submitted 10 September, 2018;
originally announced September 2018.