-
Extreme Value Modelling of Feature Residuals for Anomaly Detection in Dynamic Graphs
Authors:
Sevvandi Kandanaarachchi,
Conrad Sanderson,
Rob J. Hyndman
Abstract:
Detecting anomalies in a temporal sequence of graphs can be applied is areas such as the detection of accidents in transport networks and cyber attacks in computer networks. Existing methods for detecting abnormal graphs can suffer from multiple limitations, such as high false positive rates as well as difficulties with handling variable-sized graphs and non-trivial temporal dynamics. To address t…
▽ More
Detecting anomalies in a temporal sequence of graphs can be applied is areas such as the detection of accidents in transport networks and cyber attacks in computer networks. Existing methods for detecting abnormal graphs can suffer from multiple limitations, such as high false positive rates as well as difficulties with handling variable-sized graphs and non-trivial temporal dynamics. To address this, we propose a technique where temporal dependencies are explicitly modelled via time series analysis of a large set of pertinent graph features, followed by using residuals to remove the dependencies. Extreme Value Theory is then used to robustly model and classify any remaining extremes, aiming to produce low false positives rates. Comparative evaluations on a multitude of graph instances show that the proposed approach obtains considerably better accuracy than TensorSplat and Laplacian Anomaly Detection.
△ Less
Submitted 30 January, 2025; v1 submitted 8 October, 2024;
originally announced October 2024.
-
Anomaly detection in dynamic networks
Authors:
Sevvandi Kandanaarachchi,
Rob J Hyndman
Abstract:
Detecting anomalies from a series of temporal networks has many applications, including road accidents in transport networks and suspicious events in social networks. While there are many methods for network anomaly detection, statistical methods are under utilised in this space even though they have a long history and proven capability in handling temporal dependencies. In this paper, we introduc…
▽ More
Detecting anomalies from a series of temporal networks has many applications, including road accidents in transport networks and suspicious events in social networks. While there are many methods for network anomaly detection, statistical methods are under utilised in this space even though they have a long history and proven capability in handling temporal dependencies. In this paper, we introduce \textit{oddnet}, a feature-based network anomaly detection method that uses time series methods to model temporal dependencies. We demonstrate the effectiveness of oddnet on synthetic and real-world datasets. The R package oddnet implements this algorithm.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
LoMEF: A Framework to Produce Local Explanations for Global Model Time Series Forecasts
Authors:
Dilini Rajapaksha,
Christoph Bergmeir,
Rob J Hyndman
Abstract:
Global Forecasting Models (GFM) that are trained across a set of multiple time series have shown superior results in many forecasting competitions and real-world applications compared with univariate forecasting approaches. One aspect of the popularity of statistical forecasting models such as ETS and ARIMA is their relative simplicity and interpretability (in terms of relevant lags, trend, season…
▽ More
Global Forecasting Models (GFM) that are trained across a set of multiple time series have shown superior results in many forecasting competitions and real-world applications compared with univariate forecasting approaches. One aspect of the popularity of statistical forecasting models such as ETS and ARIMA is their relative simplicity and interpretability (in terms of relevant lags, trend, seasonality, and others), while GFMs typically lack interpretability, especially towards particular time series. This reduces the trust and confidence of the stakeholders when making decisions based on the forecasts without being able to understand the predictions. To mitigate this problem, in this work, we propose a novel local model-agnostic interpretability approach to explain the forecasts from GFMs. We train simpler univariate surrogate models that are considered interpretable (e.g., ETS) on the predictions of the GFM on samples within a neighbourhood that we obtain through bootstrapping or straightforwardly as the one-step-ahead global black-box model forecasts of the time series which needs to be explained. After, we evaluate the explanations for the forecasts of the global models in both qualitative and quantitative aspects such as accuracy, fidelity, stability and comprehensibility, and are able to show the benefits of our approach.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
A Look at the Evaluation Setup of the M5 Forecasting Competition
Authors:
Hansika Hewamalage,
Pablo Montero-Manso,
Christoph Bergmeir,
Rob J Hyndman
Abstract:
Forecast evaluation plays a key role in how empirical evidence shapes the development of the discipline. Domain experts are interested in error measures relevant for their decision making needs. Such measures may produce unreliable results. Although reliability properties of several metrics have already been discussed, it has hardly been quantified in an objective way. We propose a measure named R…
▽ More
Forecast evaluation plays a key role in how empirical evidence shapes the development of the discipline. Domain experts are interested in error measures relevant for their decision making needs. Such measures may produce unreliable results. Although reliability properties of several metrics have already been discussed, it has hardly been quantified in an objective way. We propose a measure named Rank Stability, which evaluates how much the rankings of an experiment differ in between similar datasets, when the models and errors are constant. We use this to study the evaluation setup of the M5. We find that the evaluation setup of the M5 is less reliable than other measures. The main drivers of instability are hierarchical aggregation and scaling. Price-weighting reduces the stability of all tested error measures. Scale normalization of the M5 error measure results in less stability than other scale-free errors. Hierarchical levels taken separately are less stable with more aggregation, and their combination is even less stable than individual levels. We also show positive tradeoffs of retaining aggregation importance without affecting stability. Aggregation and stability can be linked to the influence of much debated magic numbers. Many of our findings can be applied to general hierarchical forecast benchmarking.
△ Less
Submitted 8 August, 2021;
originally announced August 2021.
-
Detection of cybersecurity attacks through analysis of web browsing activities using principal component analysis
Authors:
Insha Ullah,
Kerrie Mengersen,
Rob J Hyndman,
James McGree
Abstract:
Organizations such as government departments and financial institutions provide online service facilities accessible via an increasing number of internet connected devices which make their operational environment vulnerable to cyber attacks. Consequently, there is a need to have mechanisms in place to detect cyber security attacks in a timely manner. A variety of Network Intrusion Detection System…
▽ More
Organizations such as government departments and financial institutions provide online service facilities accessible via an increasing number of internet connected devices which make their operational environment vulnerable to cyber attacks. Consequently, there is a need to have mechanisms in place to detect cyber security attacks in a timely manner. A variety of Network Intrusion Detection Systems (NIDS) have been proposed and can be categorized into signature-based NIDS and anomaly-based NIDS. The signature-based NIDS, which identify the misuse through scanning the activity signature against the list of known attack activities, are criticized for their inability to identify new attacks (never-before-seen attacks). Among anomaly-based NIDS, which declare a connection anomalous if it expresses deviation from a trained model, the unsupervised learning algorithms circumvent this issue since they have the ability to identify new attacks. In this study, we use an unsupervised learning algorithm based on principal component analysis to detect cyber attacks. In the training phase, our approach has the advantage of also identifying outliers in the training dataset. In the monitoring phase, our approach first identifies the affected dimensions and then calculates an anomaly score by aggregating across only those components that are affected by the anomalies. We explore the performance of the algorithm via simulations and through two applications, namely to the UNSW-NB15 dataset recently released by the Australian Centre for Cyber Security and to the well-known KDD'99 dataset. The algorithm is scalable to large datasets in both training and monitoring phases, and the results from both the simulated and real datasets show that the method has promise in detecting suspicious network activities.
△ Less
Submitted 27 July, 2021;
originally announced July 2021.
-
Monash Time Series Forecasting Archive
Authors:
Rakshitha Godahewa,
Christoph Bergmeir,
Geoffrey I. Webb,
Rob J. Hyndman,
Pablo Montero-Manso
Abstract:
Many businesses and industries nowadays rely on large quantities of time series data making time series forecasting an important research area. Global forecasting models that are trained across sets of time series have shown a huge potential in providing accurate forecasts compared with the traditional univariate forecasting models that work on isolated series. However, there are currently no comp…
▽ More
Many businesses and industries nowadays rely on large quantities of time series data making time series forecasting an important research area. Global forecasting models that are trained across sets of time series have shown a huge potential in providing accurate forecasts compared with the traditional univariate forecasting models that work on isolated series. However, there are currently no comprehensive time series archives for forecasting that contain datasets of time series from similar sources available for the research community to evaluate the performance of new global forecasting algorithms over a wide variety of datasets. In this paper, we present such a comprehensive time series forecasting archive containing 20 publicly available time series datasets from varied domains, with different characteristics in terms of frequency, series lengths, and inclusion of missing values. We also characterise the datasets, and identify similarities and differences among them, by conducting a feature analysis. Furthermore, we present the performance of a set of standard baseline forecasting methods over all datasets across eight error metrics, for the benefit of researchers using the archive to benchmark their forecasting algorithms.
△ Less
Submitted 14 May, 2021;
originally announced May 2021.
-
Computationally Efficient Learning of Statistical Manifolds
Authors:
Fan Cheng,
Anastasios Panagiotelis,
Rob J Hyndman
Abstract:
Analyzing high-dimensional data with manifold learning algorithms often requires searching for the nearest neighbors of all observations. This presents a computational bottleneck in statistical manifold learning when observations of probability distributions rather than vector-valued variables are available or when data size is large. We resolve this problem by proposing a new method for approxima…
▽ More
Analyzing high-dimensional data with manifold learning algorithms often requires searching for the nearest neighbors of all observations. This presents a computational bottleneck in statistical manifold learning when observations of probability distributions rather than vector-valued variables are available or when data size is large. We resolve this problem by proposing a new method for approximation in statistical manifold learning. The novelty of our approximation is the strongly consistent distance estimators based on independent and identically distributed samples from probability distributions. By exploiting the connection between Hellinger/total variation distance for discrete distributions and the L2/L1 norm, we demonstrate that the proposed distance estimators, combined with approximate nearest neighbor searching, could largely improve the computational efficiency with little to no loss in the accuracy of manifold embedding. The result is robust to different manifold learning algorithms and different approximate nearest neighbor algorithms. The proposed method is applied to learning statistical manifolds of electricity usage. This application demonstrates how underlying structures in high dimensional data, including anomalies, can be visualized and identified, in a way that is scalable to large datasets.
△ Less
Submitted 9 March, 2022; v1 submitted 22 February, 2021;
originally announced March 2021.
-
Model selection in reconciling hierarchical time series
Authors:
Mahdi Abolghasemi,
Rob J Hyndman,
Evangelos Spiliotis,
Christoph Bergmeir
Abstract:
Model selection has been proven an effective strategy for improving accuracy in time series forecasting applications. However, when dealing with hierarchical time series, apart from selecting the most appropriate forecasting model, forecasters have also to select a suitable method for reconciling the base forecasts produced for each series to make sure they are coherent. Although some hierarchical…
▽ More
Model selection has been proven an effective strategy for improving accuracy in time series forecasting applications. However, when dealing with hierarchical time series, apart from selecting the most appropriate forecasting model, forecasters have also to select a suitable method for reconciling the base forecasts produced for each series to make sure they are coherent. Although some hierarchical forecasting methods like minimum trace are strongly supported both theoretically and empirically for reconciling the base forecasts, there are still circumstances under which they might not produce the most accurate results, being outperformed by other methods. In this paper we propose an approach for dynamically selecting the most appropriate hierarchical forecasting method and succeeding better forecasting accuracy along with coherence. The approach, to be called conditional hierarchical forecasting, is based on Machine Learning classification methods and uses time series features as leading indicators for performing the selection for each hierarchy examined considering a variety of alternatives. Our results suggest that conditional hierarchical forecasting leads to significantly more accurate forecasts than standard approaches, especially at lower hierarchical levels.
△ Less
Submitted 29 October, 2020; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Forecasting for Social Good
Authors:
Bahman Rostami-Tabar,
Mohammad M Ali,
Tao Hong,
Rob J Hyndman,
Michael D Porter,
Aris Syntetos
Abstract:
Forecasting plays a critical role in the development of organisational business strategies. Despite a considerable body of research in the area of forecasting, the focus has largely been on the financial and economic outcomes of the forecasting process as opposed to societal benefits. Our motivation in this study is to promote the latter, with a view to using the forecasting process to advance soc…
▽ More
Forecasting plays a critical role in the development of organisational business strategies. Despite a considerable body of research in the area of forecasting, the focus has largely been on the financial and economic outcomes of the forecasting process as opposed to societal benefits. Our motivation in this study is to promote the latter, with a view to using the forecasting process to advance social and environmental objectives such as equality, social justice and sustainability. We refer to such forecasting practices as Forecasting for Social Good (FSG) where the benefits to society and the environment take precedence over economic and financial outcomes. We conceptualise FSG and discuss its scope and boundaries in the context of the "Doughnut theory". We present some key attributes that qualify a forecasting process as FSG: it is concerned with a real problem, it is focused on advancing social and environmental goals and prioritises these over conventional measures of economic success, and it has a broad societal impact. We also position FSG in the wider literature on forecasting and social good practices. We propose an FSG maturity framework as the means to engage academics and practitioners with research in this area. Finally, we highlight that FSG: (i) cannot be distilled to a prescriptive set of guidelines, (ii) is scalable, and (iii) has the potential to make significant contributions to advancing social objectives.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Principles and Algorithms for Forecasting Groups of Time Series: Locality and Globality
Authors:
Pablo Montero-Manso,
Rob J Hyndman
Abstract:
Forecasting groups of time series is of increasing practical importance, e.g. forecasting the demand for multiple products offered by a retailer or server loads within a data center. The local approach to this problem considers each time series separately and fits a function or model to each series. The global approach fits a single function to all series. For groups of similar time series, global…
▽ More
Forecasting groups of time series is of increasing practical importance, e.g. forecasting the demand for multiple products offered by a retailer or server loads within a data center. The local approach to this problem considers each time series separately and fits a function or model to each series. The global approach fits a single function to all series. For groups of similar time series, global methods outperform the more established local methods. However, recent results show good performance of global models even in heterogeneous datasets. This suggests a more general applicability of global methods, potentially leading to more accurate tools and new scenarios to study.
Formalizing the setting of forecasting a set of time series with local and global methods, we provide the following contributions:
1) Global methods are not more restrictive than local methods, both can produce the same forecasts without any assumptions about similarity of the series. Global models can succeed in a wider range of problems than previously thought.
2) Basic generalization bounds for local and global algorithms. The complexity of local methods grows with the size of the set while it remains constant for global methods. In large datasets, a global algorithm can afford to be quite complex and still benefit from better generalization. These bounds serve to clarify and support recent experimental results in the field, and guide the design of new algorithms. For the class of autoregressive models, this implies that global models can have much larger memory than local methods.
3) In an extensive empirical study, purposely naive algorithms derived from these principles, such as global linear models or deep networks result in superior accuracy.
In particular, global linear models can provide competitive accuracy with two orders of magnitude fewer parameters than local methods.
△ Less
Submitted 26 March, 2021; v1 submitted 2 August, 2020;
originally announced August 2020.
-
Hierarchical forecast reconciliation with machine learning
Authors:
Evangelos Spiliotis,
Mahdi Abolghasemi,
Rob J Hyndman,
Fotios Petropoulos,
Vassilios Assimakopoulos
Abstract:
Hierarchical forecasting methods have been widely used to support aligned decision-making by providing coherent forecasts at different aggregation levels. Traditional hierarchical forecasting approaches, such as the bottom-up and top-down methods, focus on a particular aggregation level to anchor the forecasts. During the past decades, these have been replaced by a variety of linear combination ap…
▽ More
Hierarchical forecasting methods have been widely used to support aligned decision-making by providing coherent forecasts at different aggregation levels. Traditional hierarchical forecasting approaches, such as the bottom-up and top-down methods, focus on a particular aggregation level to anchor the forecasts. During the past decades, these have been replaced by a variety of linear combination approaches that exploit information from the complete hierarchy to produce more accurate forecasts. However, the performance of these combination methods depends on the particularities of the examined series and their relationships. This paper proposes a novel hierarchical forecasting approach based on machine learning that deals with these limitations in three important ways. First, the proposed method allows for a non-linear combination of the base forecasts, thus being more general than the linear approaches. Second, it structurally combines the objectives of improved post-sample empirical forecasting accuracy and coherence. Finally, due to its non-linear nature, our approach selectively combines the base forecasts in a direct and automated way without requiring that the complete information must be used for producing reconciled forecasts for each series and level. The proposed method is evaluated both in terms of accuracy and bias using two different data sets coming from the tourism and retail industries. Our results suggest that the proposed method gives superior point forecasts than existing approaches, especially when the series comprising the hierarchy are not characterized by the same patterns.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
Machine learning applications in time series hierarchical forecasting
Authors:
Mahdi Abolghasemi,
Rob J Hyndman,
Garth Tarr,
Christoph Bergmeir
Abstract:
Hierarchical forecasting (HF) is needed in many situations in the supply chain (SC) because managers often need different levels of forecasts at different levels of SC to make a decision. Top-Down (TD), Bottom-Up (BU) and Optimal Combination (COM) are common HF models. These approaches are static and often ignore the dynamics of the series while disaggregating them. Consequently, they may fail to…
▽ More
Hierarchical forecasting (HF) is needed in many situations in the supply chain (SC) because managers often need different levels of forecasts at different levels of SC to make a decision. Top-Down (TD), Bottom-Up (BU) and Optimal Combination (COM) are common HF models. These approaches are static and often ignore the dynamics of the series while disaggregating them. Consequently, they may fail to perform well if the investigated group of time series are subject to large changes such as during the periods of promotional sales. We address the HF problem of predicting real-world sales time series that are highly impacted by promotion. We use three machine learning (ML) models to capture sales variations over time. Artificial neural networks (ANN), extreme gradient boosting (XGboost), and support vector regression (SVR) algorithms are used to estimate the proportions of lower-level time series from the upper level. We perform an in-depth analysis of 61 groups of time series with different volatilities and show that ML models are competitive and outperform some well-established models in the literature.
△ Less
Submitted 1 December, 2019;
originally announced December 2019.
-
Anomaly Detection in High Dimensional Data
Authors:
Priyanga Dilini Talagala,
Rob J. Hyndman,
Kate Smith-Miles
Abstract:
The HDoutliers algorithm is a powerful unsupervised algorithm for detecting anomalies in high-dimensional data, with a strong theoretical foundation. However, it suffers from some limitations that significantly hinder its performance level, under certain circumstances. In this article, we propose an algorithm that addresses these limitations. We define an anomaly as an observation that deviates ma…
▽ More
The HDoutliers algorithm is a powerful unsupervised algorithm for detecting anomalies in high-dimensional data, with a strong theoretical foundation. However, it suffers from some limitations that significantly hinder its performance level, under certain circumstances. In this article, we propose an algorithm that addresses these limitations. We define an anomaly as an observation that deviates markedly from the majority with a large distance gap. An approach based on extreme value theory is used for the anomalous threshold calculation. Using various synthetic and real datasets, we demonstrate the wide applicability and usefulness of our algorithm, which we call the stray algorithm. We also demonstrate how this algorithm can assist in detecting anomalies present in other data structures using feature engineering. We show the situations where the stray algorithm outperforms the HDoutliers algorithm both in accuracy and computational time. This framework is implemented in the open source R package stray.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
GRATIS: GeneRAting TIme Series with diverse and controllable characteristics
Authors:
Yanfei Kang,
Rob J Hyndman,
Feng Li
Abstract:
The explosion of time series data in recent years has brought a flourish of new time series analysis methods, for forecasting, clustering, classification and other tasks. The evaluation of these new methods requires either collecting or simulating a diverse set of time series benchmarking data to enable reliable comparisons against alternative approaches. We propose GeneRAting TIme Series with div…
▽ More
The explosion of time series data in recent years has brought a flourish of new time series analysis methods, for forecasting, clustering, classification and other tasks. The evaluation of these new methods requires either collecting or simulating a diverse set of time series benchmarking data to enable reliable comparisons against alternative approaches. We propose GeneRAting TIme Series with diverse and controllable characteristics, named GRATIS, with the use of mixture autoregressive (MAR) models. We simulate sets of time series using MAR models and investigate the diversity and coverage of the generated time series in a time series feature space. By tuning the parameters of the MAR models, GRATIS is also able to efficiently generate new time series with controllable features. In general, as a costless surrogate to the traditional data collection approach, GRATIS can be used as an evaluation tool for tasks such as time series forecasting and classification. We illustrate the usefulness of our time series generation process through a time series forecasting application.
△ Less
Submitted 7 January, 2020; v1 submitted 7 March, 2019;
originally announced March 2019.