-
Combinations of distributional regression algorithms with application in uncertainty estimation of corrected satellite precipitation products
Authors:
Georgia Papacharalampous,
Hristos Tyralis,
Nikolaos Doulamis,
Anastasios Doulamis
Abstract:
To facilitate effective decision-making, precipitation datasets should include uncertainty estimates. Quantile regression with machine learning has been proposed for issuing such estimates. Distributional regression offers distinct advantages over quantile regression, including the ability to model intermittency as well as a stronger ability to extrapolate beyond the training data, which is critic…
▽ More
To facilitate effective decision-making, precipitation datasets should include uncertainty estimates. Quantile regression with machine learning has been proposed for issuing such estimates. Distributional regression offers distinct advantages over quantile regression, including the ability to model intermittency as well as a stronger ability to extrapolate beyond the training data, which is critical for predicting extreme precipitation. Therefore, here, we introduce the concept of distributional regression in precipitation dataset creation, specifically for the spatial prediction task of correcting satellite precipitation products. Building upon this concept, we formulated new ensemble learning methods that can be valuable not only for spatial prediction but also for other prediction problems. These methods exploit conditional zero-adjusted probability distributions estimated with generalized additive models for location, scale and shape (GAMLSS), spline-based GAMLSS and distributional regression forests as well as their ensembles (stacking based on quantile regression and equal-weight averaging). To identify the most effective methods for our specific problem, we compared them to benchmarks using a large, multi-source precipitation dataset. Stacking was shown to be superior to individual methods at most quantile levels when evaluated with the quantile loss function. Moreover, while the relative ranking of the methods varied across different quantile levels, stacking methods, and to a lesser extent mean combiners, exhibited lower variance in their performance across different quantiles compared to individual methods that occasionally ranked extremely low. Overall, a task-specific combination of multiple distributional regression algorithms could yield significant benefits in terms of stability.
△ Less
Submitted 6 January, 2025; v1 submitted 29 June, 2024;
originally announced July 2024.
-
Ensemble learning for predictive uncertainty estimation with application to the correction of satellite precipitation products
Authors:
Georgia Papacharalampous,
Hristos Tyralis,
Nikolaos Doulamis,
Anastasios Doulamis
Abstract:
Predictions in the form of probability distributions are crucial for effective decision-making. Quantile regression enables such predictions within spatial prediction settings that aim to create improved precipitation datasets by merging remote sensing and gauge data. However, ensemble learning of quantile regression algorithms remains unexplored in this context and, at the same time, it has not b…
▽ More
Predictions in the form of probability distributions are crucial for effective decision-making. Quantile regression enables such predictions within spatial prediction settings that aim to create improved precipitation datasets by merging remote sensing and gauge data. However, ensemble learning of quantile regression algorithms remains unexplored in this context and, at the same time, it has not been substantially developed so far in the broader machine learning research landscape. Here, we introduce nine quantile-based ensemble learners and address the afore-mentioned gap in precipitation dataset creation by presenting the first application of these learners to large precipitation datasets. We employed a novel feature engineering strategy, reducing predictors to distance-weighted satellite precipitation at relevant locations, combined with location elevation. Our ensemble learners include six ensemble learning and three simple methods (mean, median, best combiner), combining six individual algorithms: quantile regression (QR), quantile regression forests (QRF), generalized random forests (GRF), gradient boosting machines (GBM), light gradient boosting machines (LightGBM), and quantile regression neural networks (QRNN). These algorithms serve as both base learners and combiners within different ensemble learning methods. We evaluated performance against a reference method (QR) using quantile scoring functions in a large dataset comprising 15 years of monthly gauge-measured and satellite precipitation in the contiguous United States (CONUS). Ensemble learning with QR and QRNN yielded the best results across quantile levels ranging from 0.025 to 0.975, outperforming the reference method by 3.91% to 8.95%. This demonstrates the potential of ensemble learning to improve probabilistic spatial predictions.
△ Less
Submitted 6 January, 2025; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Uncertainty estimation of machine learning spatial precipitation predictions from satellite data
Authors:
Georgia Papacharalampous,
Hristos Tyralis,
Nikolaos Doulamis,
Anastasios Doulamis
Abstract:
Merging satellite and gauge data with machine learning produces high-resolution precipitation datasets, but uncertainty estimates are often missing. We addressed the gap of how to optimally provide such estimates by benchmarking six algorithms, mostly novel even for the more general task of quantifying predictive uncertainty in spatial prediction settings. On 15 years of monthly data from over the…
▽ More
Merging satellite and gauge data with machine learning produces high-resolution precipitation datasets, but uncertainty estimates are often missing. We addressed the gap of how to optimally provide such estimates by benchmarking six algorithms, mostly novel even for the more general task of quantifying predictive uncertainty in spatial prediction settings. On 15 years of monthly data from over the contiguous United States (CONUS), we compared quantile regression (QR), quantile regression forests (QRF), generalized random forests (GRF), gradient boosting machines (GBM), light gradient boosting machine (LightGBM), and quantile regression neural networks (QRNN). Their ability to issue predictive precipitation quantiles at nine quantile levels (0.025, 0.050, 0.100, 0.250, 0.500, 0.750, 0.900, 0.950, 0.975), approximating the full probability distribution, was evaluated using quantile scoring functions and the quantile scoring rule. Predictors at a site were nearby values from two satellite precipitation retrievals, namely PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE Retrievals), and the site's elevation. The dependent variable was the monthly mean gauge precipitation. With respect to QR, LightGBM showed improved performance in terms of the quantile scoring rule by 11.10%, also surpassing QRF (7.96%), GRF (7.44%), GBM (4.64%) and QRNN (1.73%). Notably, LightGBM outperformed all random forest variants, the current standard in spatial prediction with machine learning. To conclude, we propose a suite of machine learning algorithms for estimating uncertainty in spatial data prediction, supported with a formal evaluation framework based on scoring functions and scoring rules.
△ Less
Submitted 21 August, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Ensemble learning for blending gridded satellite and gauge-measured precipitation data
Authors:
Georgia Papacharalampous,
Hristos Tyralis,
Nikolaos Doulamis,
Anastasios Doulamis
Abstract:
Regression algorithms are regularly used for improving the accuracy of satellite precipitation products. In this context, satellite precipitation and topography data are the predictor variables, and gauged-measured precipitation data are the dependent variables. Alongside this, it is increasingly recognised in many fields that combinations of algorithms through ensemble learning can lead to substa…
▽ More
Regression algorithms are regularly used for improving the accuracy of satellite precipitation products. In this context, satellite precipitation and topography data are the predictor variables, and gauged-measured precipitation data are the dependent variables. Alongside this, it is increasingly recognised in many fields that combinations of algorithms through ensemble learning can lead to substantial predictive performance improvements. Still, a sufficient number of ensemble learners for improving the accuracy of satellite precipitation products and their large-scale comparison are currently missing from the literature. In this study, we work towards filling in this specific gap by proposing 11 new ensemble learners in the field and by extensively comparing them. We apply the ensemble learners to monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets that span over a 15-year period and over the entire the contiguous United States (CONUS). We also use gauge-measured precipitation data from the Global Historical Climatology Network monthly database, version 2 (GHCNm). The ensemble learners combine the predictions of six machine learning regression algorithms (base learners), namely the multivariate adaptive regression splines (MARS), multivariate adaptive polynomial splines (poly-MARS), random forests (RF), gradient boosting machines (GBM), extreme gradient boosting (XGBoost) and Bayesian regularized neural networks (BRNN), and each of them is based on a different combiner. The combiners include the equal-weight combiner, the median combiner, two best learners and seven variants of a sophisticated stacking method. The latter stacks a regression algorithm on top of the base learners to combine their independent predictions...
△ Less
Submitted 14 October, 2023; v1 submitted 9 July, 2023;
originally announced July 2023.
-
Comparison of machine learning algorithms for merging gridded satellite and earth-observed precipitation data
Authors:
Georgia Papacharalampous,
Hristos Tyralis,
Anastasios Doulamis,
Nikolaos Doulamis
Abstract:
Gridded satellite precipitation datasets are useful in hydrological applications as they cover large regions with high density. However, they are not accurate in the sense that they do not agree with ground-based measurements. An established means for improving their accuracy is to correct them by adopting machine learning algorithms. This correction takes the form of a regression problem, in whic…
▽ More
Gridded satellite precipitation datasets are useful in hydrological applications as they cover large regions with high density. However, they are not accurate in the sense that they do not agree with ground-based measurements. An established means for improving their accuracy is to correct them by adopting machine learning algorithms. This correction takes the form of a regression problem, in which the ground-based measurements have the role of the dependent variable and the satellite data are the predictor variables, together with topography factors (e.g., elevation). Most studies of this kind involve a limited number of machine learning algorithms, and are conducted for a small region and for a limited time period. Thus, the results obtained through them are of local importance and do not provide more general guidance and best practices. To provide results that are generalizable and to contribute to the delivery of best practices, we here compare eight state-of-the-art machine learning algorithms in correcting satellite precipitation data for the entire contiguous United States and for a 15-year period. We use monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) gridded dataset, together with monthly earth-observed precipitation data from the Global Historical Climatology Network monthly database, version 2 (GHCNm). The results suggest that extreme gradient boosting (XGBoost) and random forests are the most accurate in terms of the squared error scoring function. The remaining algorithms can be ordered as follows from the best to the worst: Bayesian regularized feed-forward neural networks, multivariate adaptive polynomial splines (poly-MARS), gradient boosting machines (gbm), multivariate adaptive regression splines (MARS), feed-forward neural networks, linear regression.
△ Less
Submitted 3 March, 2023; v1 submitted 17 December, 2022;
originally announced January 2023.
-
Comparison of tree-based ensemble algorithms for merging satellite and earth-observed precipitation data at the daily time scale
Authors:
Georgia Papacharalampous,
Hristos Tyralis,
Anastasios Doulamis,
Nikolaos Doulamis
Abstract:
Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regularly utilized in this endeavour. At the same time, tree-based ensemble algorithms are adopted in var…
▽ More
Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regularly utilized in this endeavour. At the same time, tree-based ensemble algorithms are adopted in various fields for solving regression problems with high accuracy and low computational cost. Still, information on which tree-based ensemble algorithm to select for correcting satellite precipitation products for the contiguous United States (US) at the daily time scale is missing from the literature. In this study, we worked towards filling this methodological gap by conducting an extensive comparison between three algorithms of the category of interest, specifically between random forests, gradient boosting machines (gbm) and extreme gradient boosting (XGBoost). We used daily data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and the IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also used earth-observed precipitation data from the Global Historical Climatology Network daily (GHCNd) database. The experiments referred to the entire contiguous US and additionally included the application of the linear regression algorithm for benchmarking purposes. The results suggest that XGBoost is the best-performing tree-based ensemble algorithm among those compared...
△ Less
Submitted 3 March, 2023; v1 submitted 31 December, 2022;
originally announced January 2023.
-
Rank-R FNN: A Tensor-Based Learning Model for High-Order Data Classification
Authors:
Konstantinos Makantasis,
Alexandros Georgogiannis,
Athanasios Voulodimos,
Ioannis Georgoulas,
Anastasios Doulamis,
Nikolaos Doulamis
Abstract:
An increasing number of emerging applications in data science and engineering are based on multidimensional and structurally rich data. The irregularities, however, of high-dimensional data often compromise the effectiveness of standard machine learning algorithms. We hereby propose the Rank-R Feedforward Neural Network (FNN), a tensor-based nonlinear learning model that imposes Canonical/Polyadic…
▽ More
An increasing number of emerging applications in data science and engineering are based on multidimensional and structurally rich data. The irregularities, however, of high-dimensional data often compromise the effectiveness of standard machine learning algorithms. We hereby propose the Rank-R Feedforward Neural Network (FNN), a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters, thereby offering two core advantages compared to typical machine learning methods. First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension. Moreover, the number of the model's trainable parameters is substantially reduced, making it very efficient for small sample setting problems. We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets. Experimental evaluations show that Rank-R FNN is a computationally inexpensive alternative of ordinary FNN that achieves state-of-the-art performance on higher-order tensor data.
△ Less
Submitted 11 April, 2021;
originally announced April 2021.
-
Space-Time Domain Tensor Neural Networks: An Application on Human Pose Classification
Authors:
Konstantinos Makantasis,
Athanasios Voulodimos,
Anastasios Doulamis,
Nikolaos Bakalos,
Nikolaos Doulamis
Abstract:
Recent advances in sensing technologies require the design and development of pattern recognition models capable of processing spatiotemporal data efficiently. In this study, we propose a spatially and temporally aware tensor-based neural network for human pose classification using three-dimensional skeleton data. Our model employs three novel components. First, an input layer capable of construct…
▽ More
Recent advances in sensing technologies require the design and development of pattern recognition models capable of processing spatiotemporal data efficiently. In this study, we propose a spatially and temporally aware tensor-based neural network for human pose classification using three-dimensional skeleton data. Our model employs three novel components. First, an input layer capable of constructing highly discriminative spatiotemporal features. Second, a tensor fusion operation that produces compact yet rich representations of the data, and third, a tensor-based neural network that processes data representations in their original tensor form. Our model is end-to-end trainable and characterized by a small number of trainable parameters making it suitable for problems where the annotated data is limited. Experimental evaluation of the proposed model indicates that it can achieve state-of-the-art performance.
△ Less
Submitted 18 October, 2020; v1 submitted 17 April, 2020;
originally announced April 2020.
-
Common Mode Patterns for Supervised Tensor Subspace Learning
Authors:
Konstantinos Makantasis,
Anastasios Doulamis,
Nikolaos Doulamis,
Athanasios Voulodimos
Abstract:
In this work we propose a method for reducing the dimensionality of tensor objects in a binary classification framework. The proposed Common Mode Patterns method takes into consideration the labels' information, and ensures that tensor objects that belong to different classes do not share common features after the reduction of their dimensionality. We experimentally validate the proposed supervise…
▽ More
In this work we propose a method for reducing the dimensionality of tensor objects in a binary classification framework. The proposed Common Mode Patterns method takes into consideration the labels' information, and ensures that tensor objects that belong to different classes do not share common features after the reduction of their dimensionality. We experimentally validate the proposed supervised subspace learning technique and compared it against Multilinear Principal Component Analysis using a publicly available hyperspectral imaging dataset. Experimental results indicate that the proposed CMP method can efficiently reduce the dimensionality of tensor objects, while, at the same time, increasing the inter-class separability.
△ Less
Submitted 6 February, 2019;
originally announced February 2019.
-
Tensor-based Nonlinear Classifier for High-Order Data Analysis
Authors:
Konstantinos Makantasis,
Anastasios Doulamis,
Nikolaos Doulamis,
Antonis Nikitakis,
Athanasios Voulodimos
Abstract:
In this paper we propose a tensor-based nonlinear model for high-order data classification. The advantages of the proposed scheme are that (i) it significantly reduces the number of weight parameters, and hence of required training samples, and (ii) it retains the spatial structure of the input samples. The proposed model, called \textit{Rank}-1 FNN, is based on a modification of a feedforward neu…
▽ More
In this paper we propose a tensor-based nonlinear model for high-order data classification. The advantages of the proposed scheme are that (i) it significantly reduces the number of weight parameters, and hence of required training samples, and (ii) it retains the spatial structure of the input samples. The proposed model, called \textit{Rank}-1 FNN, is based on a modification of a feedforward neural network (FNN), such that its weights satisfy the {\it rank}-1 canonical decomposition. We also introduce a new learning algorithm to train the model, and we evaluate the \textit{Rank}-1 FNN on third-order hyperspectral data. Experimental results and comparisons indicate that the proposed model outperforms state of the art classification methods, including deep learning based ones, especially in cases with small numbers of available training samples.
△ Less
Submitted 15 February, 2018;
originally announced February 2018.