Search | arXiv e-print repository

Applications of machine learning to predict seasonal precipitation for East Africa

Authors: Michael Scheuerer, Claudio Heinrich-Mertsching, Titike K. Bahaga, Masilin Gudoshava, Thordis L. Thorarinsdottir

Abstract: Seasonal climate forecasts are commonly based on model runs from fully coupled forecasting systems that use Earth system models to represent interactions between the atmosphere, ocean, land and other Earth-system components. Recently, machine learning (ML) methods are increasingly being investigated for this task where large-scale climate variability is linked to local or regional temperature or p… ▽ More Seasonal climate forecasts are commonly based on model runs from fully coupled forecasting systems that use Earth system models to represent interactions between the atmosphere, ocean, land and other Earth-system components. Recently, machine learning (ML) methods are increasingly being investigated for this task where large-scale climate variability is linked to local or regional temperature or precipitation in a linear or non-linear fashion. This paper investigates the use of interpretable ML methods to predict seasonal precipitation for East Africa in an operational setting. Dimension reduction is performed by decomposing the precipitation fields via empirical orthogonal functions (EOFs), such that only the respective factor loadings need to the predicted. Indices of large-scale climate variability--including the rate of change in individual indices as well as interactions between different indices--are then used as potential features to obtain tercile forecasts from an interpretable ML algorithm. Several research questions regarding the use of data and the effect of model complexity are studied. The results are compared against the ECMWF seasonal forecasting system (SEAS5) for three seasons--MAM, JJAS and OND--over the period 1993-2020. Compared to climatology for the same period, the ECMWF forecasts have negative skill in MAM and JJAS and significant positive skill in OND. The ML approach is on par with climatology in MAM and JJAS and a significantly positive skill in OND, if not quite at the level of the OND ECMWF forecast. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2310.02292 [pdf, other]

What you see is not what is there: Mechanisms, models, and methods for point pattern deviations

Authors: Peter Guttorp, Janine Illian, Joel Kostensalo, Mikko Kuronen, Mari Myllymäki, Aila Särkkä, Thordis L. Thorarinsdottir

Abstract: Many natural systems are observed as point patterns in time, space, or space and time. Examples include plant and cellular systems, animal colonies, earthquakes, and wildfires. In practice the locations of the points are not always observed correctly. However, in the point process literature, there has been relatively scant attention paid to the issue of errors in the location of points. In this p… ▽ More Many natural systems are observed as point patterns in time, space, or space and time. Examples include plant and cellular systems, animal colonies, earthquakes, and wildfires. In practice the locations of the points are not always observed correctly. However, in the point process literature, there has been relatively scant attention paid to the issue of errors in the location of points. In this paper, we discuss how the observed point pattern may deviate from the actual point pattern and review methods and models that exist to handle such deviations. The discussion is supplemented with several scientific illustrations. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2201.01121 [pdf, other]

doi 10.1002/qj.4403

Probabilistic prediction of the time to hard freeze using seasonal weather forecasts and survival time methods

Authors: Thea Roksvåg, Alex Lenkoski, Michael Scheuerer, Claudio Heinrich-Mertsching, Thordis L. Thorarinsdottir

Abstract: Agricultural food production and natural ecological systems depend on a range of seasonal climate indicators that describe seasonal patterns in climatological conditions. This paper proposes a probabilistic forecasting framework for predicting the end of the freeze-free season, or the time to a mean daily near-surface air temperature below 0 $^\circ$C (here referred to as hard freeze). The forecas… ▽ More Agricultural food production and natural ecological systems depend on a range of seasonal climate indicators that describe seasonal patterns in climatological conditions. This paper proposes a probabilistic forecasting framework for predicting the end of the freeze-free season, or the time to a mean daily near-surface air temperature below 0 $^\circ$C (here referred to as hard freeze). The forecasting framework is based on the multi-model seasonal forecast ensemble provided by the Copernicus Climate Data Store and uses techniques from survival analysis for time-to-event data. The original mean daily temperature forecasts are statistically post-processed with a mean and variance correction of each model system before the time-to-event forecast is constructed. In a case study for a region in Fennoscandia covering Norway for the period 1993-2020, the proposed forecasts are found to outperform a climatology forecast from an observation-based data product at locations where the average predicted time to hard freeze is less than 40 days after the initialization date of the forecast on October 1. △ Less

Submitted 4 January, 2022; originally announced January 2022.

arXiv:2110.15862 [pdf, other]

Assessing present and future risk of water damage using building attributes, meteorology and topography

Authors: Claudio Heinrich-Mertsching, Jens Christian Wahl, Alba Ordonez, Marita Stien, John Elvsborg, Ola Haug, Thordis L. Thorarinsdottir

Abstract: Weather-related risk makes the insurance industry inevitably concerned with climate and climate change. Buildings hit by pluvial flooding is a key manifestation of this risk, giving rise to compensations of the induced physical damages and business interruptions. In this work, we establish a nationwide, building-specific risk score for water damage associated with pluvial flooding in Norway. We fi… ▽ More Weather-related risk makes the insurance industry inevitably concerned with climate and climate change. Buildings hit by pluvial flooding is a key manifestation of this risk, giving rise to compensations of the induced physical damages and business interruptions. In this work, we establish a nationwide, building-specific risk score for water damage associated with pluvial flooding in Norway. We fit a generalized additive model that relates the number of water damages to a wide range of explanatory variables that can be categorized into building attributes, climatological variables and topographical characteristics. The model assigns a risk score to every location in Norway, based on local topography and climate, which is not only useful for insurance companies, but also for city planning. Combining our model with an ensemble of climate projections allows us to project the (spatially varying) impacts of climate change on the risk of pluvial flooding towards the middle and end of the 21st century. △ Less

Submitted 2 November, 2021; v1 submitted 29 October, 2021; originally announced October 2021.

MSC Class: 62P05

arXiv:2110.11803 [pdf, other]

Validation of point process predictions with proper scoring rules

Authors: Claudio Heinrich-Mertsching, Thordis L. Thorarinsdottir, Peter Guttorp, Max Schneider

Abstract: We introduce a class of proper scoring rules for evaluating spatial point process forecasts based on summary statistics. These scoring rules rely on Monte-Carlo approximations of expectations and can therefore easily be evaluated for any point process model that can be simulated. In this regard, they are more flexible than the commonly used logarithmic score and other existing proper scores for po… ▽ More We introduce a class of proper scoring rules for evaluating spatial point process forecasts based on summary statistics. These scoring rules rely on Monte-Carlo approximations of expectations and can therefore easily be evaluated for any point process model that can be simulated. In this regard, they are more flexible than the commonly used logarithmic score and other existing proper scores for point process predictions. The scoring rules allow for evaluating the calibration of a model to specific aspects of a point process, such as its spatial distribution or tendency towards clustering. Using simulations we analyze the sensitivity of our scoring rules to different aspects of the forecasts and compare it to the logarithmic score. Applications to earthquake occurrences in northern California, USA and the spatial distribution of Pacific silver firs in Findley Lake Reserve in Washington, USA highlight the usefulness of our scores for scientific model selection. △ Less

Submitted 2 November, 2021; v1 submitted 22 October, 2021; originally announced October 2021.

arXiv:2109.11180 [pdf, other]

doi 10.1002/env.2719

Quantile based modelling of diurnal temperature range with the five-parameter lambda distribution

Authors: Silius M. Vandeskog, Thordis L. Thorarinsdottir, Ingelin Steinsland, Finn Lindgren

Abstract: Diurnal temperature range is an important variable in climate science that can provide information regarding climate variability and climate change. Changes in diurnal temperature range can have implications for hydrology, human health and ecology, among others. Yet, the statistical literature on modelling diurnal temperature range is lacking. In this paper we propose to model the distribution of… ▽ More Diurnal temperature range is an important variable in climate science that can provide information regarding climate variability and climate change. Changes in diurnal temperature range can have implications for hydrology, human health and ecology, among others. Yet, the statistical literature on modelling diurnal temperature range is lacking. In this paper we propose to model the distribution of diurnal temperature range using the five-parameter lambda (FPL) distribution. Additionally, in order to model diurnal temperature range with explanatory variables, we propose a distributional quantile regression model that combines quantile regression with marginal modelling using the FPL distribution. Inference is performed using the method of quantiles. The models are fitted to 30 years of daily observations of diurnal temperature range from 112 weather stations in the southern part of Norway. The flexible FPL distribution shows great promise as a model for diurnal temperature range, and performs well against competing models. The distributional quantile regression model is fitted to diurnal temperature range data using geographic, orographic and climatological explanatory variables. It performs well and captures much of the spatial variation in the distribution of diurnal temperature range in Norway. △ Less

Submitted 24 January, 2022; v1 submitted 23 September, 2021; originally announced September 2021.

Comments: 28 pages, 9 figures; v2: revision of the introduction, more references added and minor corrections of the text

arXiv:1907.09716 [pdf, other]

Multivariate postprocessing methods for high-dimensional seasonal weather forecasts

Authors: Claudio Heinrich, Kristoffer H. Hellton, Alex Lenkoski, Thordis L. Thorarinsdottir

Abstract: Seasonal weather forecasts are crucial for long-term planning in many practical situations and skillful forecasts may have substantial economic and humanitarian implications. Current seasonal forecasting models require statistical postprocessing of the output to correct systematic biases and unrealistic uncertainty assessments. We propose a multivariate postprocessing approach utilizing covariance… ▽ More Seasonal weather forecasts are crucial for long-term planning in many practical situations and skillful forecasts may have substantial economic and humanitarian implications. Current seasonal forecasting models require statistical postprocessing of the output to correct systematic biases and unrealistic uncertainty assessments. We propose a multivariate postprocessing approach utilizing covariance tapering, combined with a dimension reduction step based on principal component analysis for efficient computation. Our proposed technique can correctly and efficiently handle non-stationary, non-isotropic and negatively correlated spatial error patterns, and is applicable on a global scale. Further, a moving average approach to marginal postprocessing is shown to flexibly handle trends in biases caused by global warming, and short training periods. In an application to global sea surface temperature forecasts issued by the Norwegian Climate Prediction Model (NorCPM), our proposed methodology is shown to outperform known reference methods. △ Less

Submitted 8 November, 2019; v1 submitted 23 July, 2019; originally announced July 2019.

MSC Class: 62P02

arXiv:1901.08874 [pdf, other]

Spatial trend analysis of gridded temperature data at varying spatial scales

Authors: Ola Haug, Thordis L Thorarinsdottir, Sigrunn H Sørbye, Christian L E Franzke

Abstract: Classical assessments of trends in gridded temperature data perform independent evaluations across the grid, thus, ignoring spatial correlations in the trend estimates. In particular, this affects assessments of trend significance as evaluation of the collective significance of individual tests is commonly neglected. In this article we build a space-time hierarchical Bayesian model for temperature… ▽ More Classical assessments of trends in gridded temperature data perform independent evaluations across the grid, thus, ignoring spatial correlations in the trend estimates. In particular, this affects assessments of trend significance as evaluation of the collective significance of individual tests is commonly neglected. In this article we build a space-time hierarchical Bayesian model for temperature anomalies where the trend coefficient is modeled by a latent Gaussian random field. This enables us to calculate simultaneous credible regions for joint significance assessments. In a case study, we assess summer season trends in 65 years of gridded temperature data over Europe. We find that while spatial smoothing generally results in larger regions where the null hypothesis of no trend is rejected, this is not the case for all sub-regions. △ Less

Submitted 25 January, 2019; originally announced January 2019.

arXiv:1802.09278 [pdf, other]

doi 10.1029/2017WR022460

Bayesian regional flood frequency analysis for large catchments

Authors: Thordis L. Thorarinsdottir, Kristoffer H. Hellton, Gunnhildur H. Steinbakk, Lena Schlichting, Kolbjørn Engeland

Abstract: Regional flood frequency analysis is commonly applied in situations where there exists insufficient data at a location for a reliable estimation of flood quantiles. We develop a Bayesian hierarchical modeling framework for a regional analysis of data from 203 large catchments in Norway with the generalized extreme value (GEV) distribution as the underlying model. Generalized linear models on the p… ▽ More Regional flood frequency analysis is commonly applied in situations where there exists insufficient data at a location for a reliable estimation of flood quantiles. We develop a Bayesian hierarchical modeling framework for a regional analysis of data from 203 large catchments in Norway with the generalized extreme value (GEV) distribution as the underlying model. Generalized linear models on the parameters of the GEV distribution are able to incorporate location-specific geographic and meteorological information and thereby accommodate these effects on the flood quantiles. A Bayesian model averaging component additionally assesses model uncertainty in the effect of the proposed covariates. The resulting regional model is seen to give substantially better predictive performance than the regional model currently used in Norway. △ Less

Submitted 2 March, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

arXiv:1702.00728 [pdf, ps, other]

Evaluation of time series models under non-stationarity with application to the comparison of regional climate models

Authors: T. M. Erhardt, C. Czado, T. L. Thorarinsdottir

Abstract: Different disciplines pursue the aim to develop models which characterize certain phenomena as accurately as possible. Climatology is a prime example, where the temporal evolution of the climate is modeled. In order to compare and improve different models, methodology for a fair model evaluation is indispensable. As models and forecasts of a phenomenon are usually associated with uncertainty, prop… ▽ More Different disciplines pursue the aim to develop models which characterize certain phenomena as accurately as possible. Climatology is a prime example, where the temporal evolution of the climate is modeled. In order to compare and improve different models, methodology for a fair model evaluation is indispensable. As models and forecasts of a phenomenon are usually associated with uncertainty, proper scoring rules, which are tools that account for this kind of uncertainty, are an adequate choice for model evaluation. However, under the presence of non-stationarity, such a model evaluation becomes challenging, as the characteristics of the phenomenon of interest change. We provide methodology for model evaluation in the context of non-stationary time series. Our methodology assumes stationarity of the time series in shorter moving time windows. These moving windows, which are selected based on a changepoint analysis, are used to characterize the uncertainty of the phenomenon/model for the corresponding time instances. This leads to the concept of moving scores allowing for a temporal assessment of the model performance. The merits of the proposed methodology are illustrated in a simulation and a case study. △ Less

Submitted 2 February, 2017; originally announced February 2017.

arXiv:1608.06802 [pdf, other]

Predictive Inference Based on Markov Chain Monte Carlo Output

Authors: Fabian Krüger, Sebastian Lerch, Thordis L. Thorarinsdottir, Tilmann Gneiting

Abstract: In Bayesian inference, predictive distributions are typically in the form of samples generated via Markov chain Monte Carlo (MCMC) or related algorithms. In this paper, we conduct a systematic analysis of how to make and evaluate probabilistic forecasts from such simulation output. Based on proper scoring rules, we develop a notion of consistency that allows to assess the adequacy of methods for e… ▽ More In Bayesian inference, predictive distributions are typically in the form of samples generated via Markov chain Monte Carlo (MCMC) or related algorithms. In this paper, we conduct a systematic analysis of how to make and evaluate probabilistic forecasts from such simulation output. Based on proper scoring rules, we develop a notion of consistency that allows to assess the adequacy of methods for estimating the stationary distribution underlying the simulation output. We then provide asymptotic results that account for the salient features of Bayesian posterior simulators, and derive conditions under which choices from the literature satisfy our notion of consistency. Importantly, these conditions depend on the scoring rule being used, such that the choices of approximation method and scoring rule are intertwined. While the logarithmic rule requires fairly stringent conditions, the continuous ranked probability score (CRPS) yields consistent approximations under minimal assumptions. These results are illustrated in a simulation study and an economic data example. Overall, mixture-of-parameters approximations which exploit the parametric structure of Bayesian models perform particularly well. Under the CRPS, the empirical distribution function is a simple and appealing alternative option. △ Less

Submitted 24 June, 2020; v1 submitted 24 August, 2016; originally announced August 2016.

arXiv:1512.09244 [pdf, other]

Forecaster's Dilemma: Extreme Events and Forecast Evaluation

Authors: Sebastian Lerch, Thordis L. Thorarinsdottir, Francesco Ravazzolo, Tilmann Gneiting

Abstract: In public discussions of the quality of forecasts, attention typically focuses on the predictive performance in cases of extreme events. However, the restriction of conventional forecast evaluation methods to subsets of extreme observations has unexpected and undesired effects, and is bound to discredit skillful forecasts when the signal-to-noise ratio in the data generating process is low. Condit… ▽ More In public discussions of the quality of forecasts, attention typically focuses on the predictive performance in cases of extreme events. However, the restriction of conventional forecast evaluation methods to subsets of extreme observations has unexpected and undesired effects, and is bound to discredit skillful forecasts when the signal-to-noise ratio in the data generating process is low. Conditioning on outcomes is incompatible with the theoretical assumptions of established forecast evaluation methods, thereby confronting forecasters with what we refer to as the forecaster's dilemma. For probabilistic forecasts, proper weighted scoring rules have been proposed as decision theoretically justifiable alternatives for forecast evaluation with an emphasis on extreme events. Using theoretical arguments, simulation experiments, and a real data study on probabilistic forecasts of U.S. inflation and gross domestic product growth, we illustrate and discuss the forecaster's dilemma along with potential remedies. △ Less

Submitted 31 December, 2015; originally announced December 2015.

arXiv:1507.05066 [pdf, other]

Spatially adaptive, Bayesian estimation for probabilistic temperature forecasts

Authors: Annette Möller, Thordis L. Thorarinsdottir, Alex Lenkoski, Tilmann Gneiting

Abstract: Uncertainty in the prediction of future weather is commonly assessed through the use of forecast ensembles that employ a numerical weather prediction model in distinct variants. Statistical postprocessing can correct for biases in the numerical model and improves calibration. We propose a Bayesian version of the standard ensemble model output statistics (EMOS) postprocessing method, in which spati… ▽ More Uncertainty in the prediction of future weather is commonly assessed through the use of forecast ensembles that employ a numerical weather prediction model in distinct variants. Statistical postprocessing can correct for biases in the numerical model and improves calibration. We propose a Bayesian version of the standard ensemble model output statistics (EMOS) postprocessing method, in which spatially varying bias coefficients are interpreted as realizations of Gaussian Markov random fields. Our Markovian EMOS (MEMOS) technique utilizes the recently developed stochastic partial differential equation (SPDE) and integrated nested Laplace approximation (INLA) methods for computationally efficient inference. The MEMOS approach shows good predictive performance in a comparative study of 24-hour ahead temperature forecasts over Germany based on the 50-member ensemble of the European Centre for Medium-Range Weather Forecasting (ECMWF). △ Less

Submitted 15 June, 2016; v1 submitted 17 July, 2015; originally announced July 2015.

arXiv:1502.01750 [pdf, ps, other]

Gaussian Random Particles with Flexible Hausdorff Dimension

Authors: Linda V. Hansen, Thordis L. Thorarinsdottir, Evgeni Ovcharov, Tilmann Gneiting, Donald Richards

Abstract: Gaussian particles provide a flexible framework for modelling and simulating three-dimensional star-shaped random sets. In our framework, the radial function of the particle arises from a kernel smoothing, and is associated with an isotropic random field on the sphere. If the kernel is a von Mises--Fisher density, or uniform on a spherical cap, the correlation function of the associated random fie… ▽ More Gaussian particles provide a flexible framework for modelling and simulating three-dimensional star-shaped random sets. In our framework, the radial function of the particle arises from a kernel smoothing, and is associated with an isotropic random field on the sphere. If the kernel is a von Mises--Fisher density, or uniform on a spherical cap, the correlation function of the associated random field admits a closed form expression. The Hausdorff dimension of the surface of the Gaussian particle reflects the decay of the correlation function at the origin, as quantified by the fractal index. Under power kernels we obtain particles with boundaries of any Hausdorff dimension between 2 and 3. △ Less

Submitted 12 February, 2015; v1 submitted 5 February, 2015; originally announced February 2015.

Comments: 22 pages, 5 figures, 3 tables; to appear in Advances in Applied Probability

MSC Class: Primary: 60D05; Secondary: 60G60; 37F35

arXiv:1407.0058 [pdf, other]

doi 10.1175/MWR-D-14-00210.1

Spatial postprocessing of ensemble forecasts for temperature using nonhomogeneous Gaussian regression

Authors: Kira Feldmann, Michael Scheuerer, Thordis L. Thorarinsdottir

Abstract: Statistical postprocessing techniques are commonly used to improve the skill of ensembles of numerical weather forecasts. This paper considers spatial extensions of the well-established nonhomogeneous Gaussian regression (NGR) postprocessing technique for surface temperature and a recent modification thereof in which the local climatology is included in the regression model for a locally adaptive… ▽ More Statistical postprocessing techniques are commonly used to improve the skill of ensembles of numerical weather forecasts. This paper considers spatial extensions of the well-established nonhomogeneous Gaussian regression (NGR) postprocessing technique for surface temperature and a recent modification thereof in which the local climatology is included in the regression model for a locally adaptive postprocessing. In a comparative study employing 21 h forecasts from the COSMO-DE ensemble predictive system over Germany, two approaches for modeling spatial forecast error correlations are considered: A parametric Gaussian random field model and the ensemble copula coupling approach which utilizes the spatial rank correlation structure of the raw ensemble. Additionally, the NGR methods are compared to both univariate and spatial versions of the ensemble Bayesian model averaging (BMA) postprocessing technique. △ Less

Submitted 30 June, 2014; originally announced July 2014.

arXiv:1311.7401 [pdf, other]

Shape from Texture using Locally Scaled Point Processes

Authors: Eva-Maria Didden, Thordis Linda Thorarinsdottir, Alex Lenkoski, Christoph Schnörr

Abstract: Shape from texture refers to the extraction of 3D information from 2D images with irregular texture. This paper introduces a statistical framework to learn shape from texture where convex texture elements in a 2D image are represented through a point process. In a first step, the 2D image is preprocessed to generate a probability map corresponding to an estimate of the unnormalized intensity of th… ▽ More Shape from texture refers to the extraction of 3D information from 2D images with irregular texture. This paper introduces a statistical framework to learn shape from texture where convex texture elements in a 2D image are represented through a point process. In a first step, the 2D image is preprocessed to generate a probability map corresponding to an estimate of the unnormalized intensity of the latent point process underlying the texture elements. The latent point process is subsequently inferred from the probability map in a non-parametric, model free manner. Finally, the 3D information is extracted from the point pattern by applying a locally scaled point process model where the local scaling function represents the deformation caused by the projection of a 3D surface onto a 2D image. △ Less

Submitted 28 November, 2013; originally announced November 2013.

arXiv:1310.0236 [pdf, other]

Assessing the calibration of high-dimensional ensemble forecasts using rank histograms

Authors: Thordis L. Thorarinsdottir, Michael Scheuerer, Christopher Heinz

Abstract: Any decision making process that relies on a probabilistic forecast of future events necessarily requires a calibrated forecast. This paper proposes new methods for empirically assessing forecast calibration in a multivariate setting where the probabilistic forecast is given by an ensemble of equally probable forecast scenarios. Multivariate properties are mapped to a single dimension through a pr… ▽ More Any decision making process that relies on a probabilistic forecast of future events necessarily requires a calibrated forecast. This paper proposes new methods for empirically assessing forecast calibration in a multivariate setting where the probabilistic forecast is given by an ensemble of equally probable forecast scenarios. Multivariate properties are mapped to a single dimension through a pre-rank function and the calibration is subsequently assessed visually through a histogram of the ranks of the observation's pre-ranks. Average ranking assigns a pre-rank based on the average univariate rank while band depth ranking employs the concept of functional band depth where the centrality of the observation within the forecast ensemble is assessed. Several simulation examples and a case study of temperature forecast trajectories at Berlin Tegel Airport in Germany demonstrate that both multivariate ranking methods can successfully detect various sources of miscalibration and scale efficiently to high dimensional settings. △ Less

Submitted 30 June, 2014; v1 submitted 1 October, 2013; originally announced October 2013.

arXiv:1309.6111 [pdf, other]

Bayesian hierarchical modeling of extreme hourly precipitation in Norway

Authors: Anita V. Dyrrdal, Alex Lenkoski, Thordis L. Thorarinsdottir, Frode Stordal

Abstract: Spatial maps of extreme precipitation are a critical component of flood estimation in hydrological modeling, as well as in the planning and design of important infrastructure. This is particularly relevant in countries such as Norway that have a high density of hydrological power generating facilities and are exposed to significant risk of infrastructure damage due to flooding. In this work, we es… ▽ More Spatial maps of extreme precipitation are a critical component of flood estimation in hydrological modeling, as well as in the planning and design of important infrastructure. This is particularly relevant in countries such as Norway that have a high density of hydrological power generating facilities and are exposed to significant risk of infrastructure damage due to flooding. In this work, we estimate a spatially coherent map of the distribution of extreme hourly precipitation in Norway, in terms of return levels, by linking generalized extreme value (GEV) distributions with latent Gaussian fields in a Bayesian hierarchical model. Generalized linear models on the parameters of the GEV distribution are able to incorporate location-specific geographic and meteorological information and thereby accommodate these effects on extreme precipitation. A Gaussian field on the GEV parameters captures additional unexplained spatial heterogeneity and overcomes the sparse grid on which observations are collected. We conduct an extensive analysis of the factors that affect the GEV parameters and show that our combination is able to appropriately characterize both the spatial variability of the distribution of extreme hourly precipitation in Norway, and the associated uncertainty in these estimates. △ Less

Submitted 26 May, 2014; v1 submitted 24 September, 2013; originally announced September 2013.

arXiv:1308.0469 [pdf, other]

Bayesian Motion Estimation for Dust Aerosols

Authors: Fabian E. Bachl, Alex Lenkoski, Thordis L. Thorarinsdottir, Christoph S. Garbe

Abstract: Dust storms in the earth's major desert regions significantly influence microphysical weather processes, the CO$_2$-cycle and the global climate in general. Recent increases in the spatio-temporal resolution of remote sensing instruments have created new opportunities to understand these phenomena. However, the scale of the data collected and the inherent stochasticity of the underlying process po… ▽ More Dust storms in the earth's major desert regions significantly influence microphysical weather processes, the CO$_2$-cycle and the global climate in general. Recent increases in the spatio-temporal resolution of remote sensing instruments have created new opportunities to understand these phenomena. However, the scale of the data collected and the inherent stochasticity of the underlying process pose significant challenges, requiring a careful combination of image processing and statistical techniques. In particular, using satellite imagery data, we develop a statistical model of atmospheric transport that relies on a latent Gaussian Markov random field (GMRF) for inference. In doing so, we make a link between the optical flow method of Horn and Schunck and the formulation of the transport process as a latent field in a generalized linear model, which enables the use of the integrated nested Laplace approximation for inference. This framework is specified such that it satisfies the so-called integrated continuity equation, thereby intrinsically expressing the divergence of the field as a multiplicative factor covering air compressibility and satellite column projection. The importance of this step -- as well as treating the problem in a fully statistical manner -- is emphasized by a simulation study where inference based on this latent GMRF clearly reduces errors of the estimated flow field. We conclude with a study of the dynamics of dust storms formed over Saharan Africa and show that our methodology is able to accurately and coherently track the storm movement, a critical problem in this field. △ Less

Submitted 5 August, 2013; v1 submitted 2 August, 2013; originally announced August 2013.

arXiv:1305.3409 [pdf, other]

Calibration diagnostics for point process models via the probability integral transform

Authors: Thordis L. Thorarinsdottir

Abstract: We propose the use of the probability integral transform (PIT) for model validation in point process models. The simple PIT diagnostics assess the calibration of the model and can detect inconsistencies in both the intensity and the interaction structure. For the Poisson model, the PIT diagnostics can be calculated explicitly. Generally, the calibration may be assessed empirically based on random… ▽ More We propose the use of the probability integral transform (PIT) for model validation in point process models. The simple PIT diagnostics assess the calibration of the model and can detect inconsistencies in both the intensity and the interaction structure. For the Poisson model, the PIT diagnostics can be calculated explicitly. Generally, the calibration may be assessed empirically based on random draws from the model and the method applies to processes of any dimension. △ Less

Submitted 15 May, 2013; originally announced May 2013.

arXiv:1305.2026 [pdf, other]

doi 10.3402/tellusa.v65i0.21206

Comparison of nonhomogeneous regression models for probabilistic wind speed forecasting

Authors: Sebastian Lerch, Thordis L. Thorarinsdottir

Abstract: In weather forecasting, nonhomogeneous regression is used to statistically postprocess forecast ensembles in order to obtain calibrated predictive distributions. For wind speed forecasts, the regression model is given by a truncated normal distribution where location and spread are derived from the ensemble. This paper proposes two alternative approaches which utilize the generalized extreme value… ▽ More In weather forecasting, nonhomogeneous regression is used to statistically postprocess forecast ensembles in order to obtain calibrated predictive distributions. For wind speed forecasts, the regression model is given by a truncated normal distribution where location and spread are derived from the ensemble. This paper proposes two alternative approaches which utilize the generalized extreme value (GEV) distribution. A direct alternative to the truncated normal regression is to apply a predictive distribution from the GEV family, while a regime switching approach based on the median of the forecast ensemble incorporates both distributions. In a case study on daily maximum wind speed over Germany with the forecast ensemble from the European Centre for Medium-Range Weather Forecasts, all three approaches provide calibrated and sharp predictive distributions with the regime switching approach showing the highest skill in the upper tail. △ Less

Submitted 9 May, 2013; originally announced May 2013.

Journal ref: Tellus A 2013, 65, 21206

arXiv:1302.7149 [pdf, ps, other]

doi 10.1214/13-STS443

Uncertainty Quantification in Complex Simulation Models Using Ensemble Copula Coupling

Authors: Roman Schefzik, Thordis L. Thorarinsdottir, Tilmann Gneiting

Abstract: Critical decisions frequently rely on high-dimensional output from complex computer simulation models that show intricate cross-variable, spatial and temporal dependence structures, with weather and climate predictions being key examples. There is a strongly increasing recognition of the need for uncertainty quantification in such settings, for which we propose and review a general multi-stage pro… ▽ More Critical decisions frequently rely on high-dimensional output from complex computer simulation models that show intricate cross-variable, spatial and temporal dependence structures, with weather and climate predictions being key examples. There is a strongly increasing recognition of the need for uncertainty quantification in such settings, for which we propose and review a general multi-stage procedure called ensemble copula coupling (ECC), proceeding as follows: 1. Generate a raw ensemble, consisting of multiple runs of the computer model that differ in the inputs or model parameters in suitable ways. 2. Apply statistical postprocessing techniques, such as Bayesian model averaging or nonhomogeneous regression, to correct for systematic errors in the raw ensemble, to obtain calibrated and sharp predictive distributions for each univariate output variable individually. 3. Draw a sample from each postprocessed predictive distribution. 4. Rearrange the sampled values in the rank order structure of the raw ensemble to obtain the ECC postprocessed ensemble. The use of ensembles and statistical postprocessing have become routine in weather forecasting over the past decade. We show that seemingly unrelated, recent advances can be interpreted, fused and consolidated within the framework of ECC, the common thread being the adoption of the empirical copula of the raw ensemble. Depending on the use of Quantiles, Random draws or Transformations at the sampling stage, we distinguish the ECC-Q, ECC-R and ECC-T variants, respectively. We also describe relations to the Schaake shuffle and extant copula-based techniques. In a case study, the ECC approach is applied to predictions of temperature, pressure, precipitation and wind over Germany, based on the 50-member European Centre for Medium-Range Weather Forecasts (ECMWF) ensemble. △ Less

Submitted 23 December, 2013; v1 submitted 28 February, 2013; originally announced February 2013.

Comments: Published in at http://dx.doi.org/10.1214/13-STS443 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS443

Journal ref: Statistical Science 2013, Vol. 28, No. 4, 616-640

arXiv:1301.5927 [pdf, other]

Using proper divergence functions to evaluate climate models

Authors: Thordis L. Thorarinsdottir, Tilmann Gneiting, Nadine Gissibl

Abstract: It has been argued persuasively that, in order to evaluate climate models, the probability distributions of model output need to be compared to the corresponding empirical distributions of observed data. Distance measures between probability distributions, also called divergence functions, can be used for this purpose. We contend that divergence functions ought to be proper, in the sense that acti… ▽ More It has been argued persuasively that, in order to evaluate climate models, the probability distributions of model output need to be compared to the corresponding empirical distributions of observed data. Distance measures between probability distributions, also called divergence functions, can be used for this purpose. We contend that divergence functions ought to be proper, in the sense that acting on modelers' true beliefs is an optimal strategy. Score divergences that derive from proper scoring rules are proper, with the integrated quadratic distance and the Kullback-Leibler divergence being particularly attractive choices. Other commonly used divergences fail to be proper. In an illustration, we evaluate and rank simulations from fifteen climate models for temperature extremes in a comparison to re-analysis data. △ Less

Submitted 16 July, 2013; v1 submitted 24 January, 2013; originally announced January 2013.

arXiv:1204.1022 [pdf, other]

doi 10.1002/env.2176

Forecast verification for extreme value distributions with an application to probabilistic peak wind prediction

Authors: Petra Friederichs, Thordis L. Thorarinsdottir

Abstract: Predictions of the uncertainty associated with extreme events are a vital component of any prediction system for such events. Consequently, the prediction system ought to be probabilistic in nature, with the predictions taking the form of probability distributions. This paper concerns probabilistic prediction systems where the data is assumed to follow either a generalized extreme value distributi… ▽ More Predictions of the uncertainty associated with extreme events are a vital component of any prediction system for such events. Consequently, the prediction system ought to be probabilistic in nature, with the predictions taking the form of probability distributions. This paper concerns probabilistic prediction systems where the data is assumed to follow either a generalized extreme value distribution (GEV) or a generalized Pareto distribution (GPD). In this setting, the properties of proper scoring rules which facilitate the assessment of the prediction uncertainty are investigated and closed-from expressions for the continuous ranked probability score (CRPS) are provided. In an application to peak wind prediction, the predictive performance of a GEV model under maximum likelihood estimation, optimum score estimation with the CRPS, and a Bayesian framework are compared. The Bayesian inference yields the highest overall prediction skill and is shown to be a valuable tool for covariate selection, while the predictions obtained under optimum CRPS estimation are the sharpest and give the best performance for high thresholds and quantiles. △ Less

Submitted 28 September, 2012; v1 submitted 4 April, 2012; originally announced April 2012.

Journal ref: Environmetrics 23 (2012) 579-594

arXiv:1202.3956 [pdf, other]

doi 10.1002/qj.2009

Multivariate probabilistic forecasting using Bayesian model averaging and copulas

Authors: Annette Möller, Alex Lenkoski, Thordis L. Thorarinsdottir

Abstract: We propose a method for post-processing an ensemble of multivariate forecasts in order to obtain a joint predictive distribution of weather. Our method utilizes existing univariate post-processing techniques, in this case ensemble Bayesian model averaging (BMA), to obtain estimated marginal distributions. However, implementing these methods individually offers no information regarding the joint di… ▽ More We propose a method for post-processing an ensemble of multivariate forecasts in order to obtain a joint predictive distribution of weather. Our method utilizes existing univariate post-processing techniques, in this case ensemble Bayesian model averaging (BMA), to obtain estimated marginal distributions. However, implementing these methods individually offers no information regarding the joint distribution. To correct this, we propose the use of a Gaussian copula, which offers a simple procedure for recovering the dependence that is lost in the estimation of the ensemble BMA marginals. Our method is applied to 48-h forecasts of a set of five weather quantities using the 8-member University of Washington mesoscale ensemble. We show that our method recovers many well-understood dependencies between weather quantities and subsequently improves calibration and sharpness over both the raw ensemble and a method which does not incorporate joint distributional information. △ Less

Submitted 17 February, 2012; originally announced February 2012.

Comments: 17 pages, 4 figures

arXiv:1201.2612 [pdf, ps, other]

doi 10.1175/MWR-D-12-00028.1

Ensemble model output statistics for wind vectors

Authors: Nina Schuhen, Thordis L. Thorarinsdottir, Tilmann Gneiting

Abstract: A bivariate ensemble model output statistics (EMOS) technique for the postprocessing of ensemble forecasts of two-dimensional wind vectors is proposed, where the postprocessed probabilistic forecast takes the form of a bivariate normal probability density function. The postprocessed means and variances of the wind vector components are linearly bias-corrected versions of the ensemble means and ens… ▽ More A bivariate ensemble model output statistics (EMOS) technique for the postprocessing of ensemble forecasts of two-dimensional wind vectors is proposed, where the postprocessed probabilistic forecast takes the form of a bivariate normal probability density function. The postprocessed means and variances of the wind vector components are linearly bias-corrected versions of the ensemble means and ensemble variances, respectively, and the conditional correlation between the wind components is represented by a trigonometric function of the ensemble mean wind direction. In a case study on 48-hour forecasts of wind vectors over the North American Pacific Northwest with the University of Washington Mesoscale Ensemble, the bivariate EMOS density forecasts were calibrated and sharp, and showed considerable improvement over the raw ensemble and reference forecasts, including ensemble copula coupling. △ Less

Submitted 12 January, 2012; originally announced January 2012.

arXiv:1010.2318 [pdf, other]

Predicting Inflation: Professional Experts Versus No-Change Forecasts

Authors: Tilmann Gneiting, Thordis L. Thorarinsdottir

Abstract: We compare forecasts of United States inflation from the Survey of Professional Forecasters (SPF) to predictions made by simple statistical techniques. In nowcasting, economic expertise is persuasive. When projecting beyond the current quarter, novel yet simplistic probabilistic no-change forecasts are equally competitive. We further interpret surveys as ensembles of forecasts, and show that they… ▽ More We compare forecasts of United States inflation from the Survey of Professional Forecasters (SPF) to predictions made by simple statistical techniques. In nowcasting, economic expertise is persuasive. When projecting beyond the current quarter, novel yet simplistic probabilistic no-change forecasts are equally competitive. We further interpret surveys as ensembles of forecasts, and show that they can be used similarly to the ways in which ensemble prediction systems have transformed weather forecasting. Then we borrow another idea from weather forecasting, in that we apply statistical techniques to postprocess the SPF forecast, based on experience from the recent past. The foregoing conclusions remain unchanged after survey postprocessing. △ Less

Submitted 12 October, 2010; originally announced October 2010.

Showing 1–27 of 27 results for author: Thorarinsdottir, T L