-
Comparative evaluation of point process forecasts
Authors:
Jonas Brehmer,
Tilmann Gneiting,
Marcus Herrmann,
Warner Marzocchi,
Martin Schlather,
Kirstin Strokorb
Abstract:
Stochastic models of point patterns in space and time are widely used to issue forecasts or assess risk, and often they affect societally relevant decisions. We adapt the concept of consistent scoring functions and proper scoring rules, which are statistically principled tools for the comparative evaluation of predictive performance, to the point process setting, and place both new and existing me…
▽ More
Stochastic models of point patterns in space and time are widely used to issue forecasts or assess risk, and often they affect societally relevant decisions. We adapt the concept of consistent scoring functions and proper scoring rules, which are statistically principled tools for the comparative evaluation of predictive performance, to the point process setting, and place both new and existing methodology in this framework. With reference to earthquake likelihood model testing, we demonstrate that extant techniques apply in much broader contexts than previously thought. In particular, the Poisson log-likelihood can be used for theoretically principled comparative forecast evaluation in terms of cell expectations. We illustrate the approach in a simulation study and in a comparative evaluation of operational earthquake forecasts for Italy.
△ Less
Submitted 14 July, 2023; v1 submitted 22 March, 2021;
originally announced March 2021.
-
Scoring Interval Forecasts: Equal-Tailed, Shortest, and Modal Interval
Authors:
Jonas Brehmer,
Tilmann Gneiting
Abstract:
We consider different types of predictive intervals and ask whether they are elicitable, i.e. are unique minimizers of a loss or scoring function in expectation. The equal-tailed interval is elicitable, with a rich class of suitable loss functions, though subject to translation invariance, or positive homogeneity and differentiability, the Winkler interval score becomes a unique choice. The modal…
▽ More
We consider different types of predictive intervals and ask whether they are elicitable, i.e. are unique minimizers of a loss or scoring function in expectation. The equal-tailed interval is elicitable, with a rich class of suitable loss functions, though subject to translation invariance, or positive homogeneity and differentiability, the Winkler interval score becomes a unique choice. The modal interval also is elicitable, with a sole consistent scoring function, up to equivalence. However, the shortest interval fails to be elicitable relative to practically relevant classes of distributions. These results provide guidance in interval forecast evaluation and support recent choices of performance measures in forecast competitions.
△ Less
Submitted 23 October, 2020; v1 submitted 11 July, 2020;
originally announced July 2020.
-
Why scoring functions cannot assess tail properties
Authors:
Jonas Brehmer,
Kirstin Strokorb
Abstract:
Motivated by the growing interest in sound forecast evaluation techniques with an emphasis on distribution tails rather than average behaviour, we investigate a fundamental question arising in this context: Can statistical features of distribution tails be elicitable, i.e. be the unique minimizer of an expected score? We demonstrate that expected scores are not suitable to distinguish genuine tail…
▽ More
Motivated by the growing interest in sound forecast evaluation techniques with an emphasis on distribution tails rather than average behaviour, we investigate a fundamental question arising in this context: Can statistical features of distribution tails be elicitable, i.e. be the unique minimizer of an expected score? We demonstrate that expected scores are not suitable to distinguish genuine tail properties in a very strong sense. Specifically, we introduce the class of max-functionals, which contains key characteristics from extreme value theory, for instance the extreme value index. We show that its members fail to be elicitable and that their elicitation complexity is in fact infinite under mild regularity assumptions. Further we prove that, even if the information of a max-functional is reported via the entire distribution function, a proper scoring rule cannot separate max-functional values. These findings highlight the caution needed in forecast evaluation and statistical inference if relevant information is encoded by such functionals.
△ Less
Submitted 7 October, 2019; v1 submitted 10 May, 2019;
originally announced May 2019.
-
Properization: Constructing Proper Scoring Rules via Bayes Acts
Authors:
Jonas Brehmer,
Tilmann Gneiting
Abstract:
Scoring rules serve to quantify predictive performance. A scoring rule is proper if truth telling is an optimal strategy in expectation. Subject to customary regularity conditions, every scoring rule can be made proper, by applying a special case of the Bayes act construction studied by Grünwald and Dawid (2004) and Dawid (2007), to which we refer as properization. We discuss examples from the rec…
▽ More
Scoring rules serve to quantify predictive performance. A scoring rule is proper if truth telling is an optimal strategy in expectation. Subject to customary regularity conditions, every scoring rule can be made proper, by applying a special case of the Bayes act construction studied by Grünwald and Dawid (2004) and Dawid (2007), to which we refer as properization. We discuss examples from the recent literature and apply the construction to create new types, and reinterpret existing forms, of proper scoring rules and consistent scoring functions. In an abstract setting, we formulate sufficient conditions under which Bayes acts exist and scoring rules can be made proper.
△ Less
Submitted 16 August, 2018; v1 submitted 19 June, 2018;
originally announced June 2018.
-
Elicitability and its Application in Risk Management
Authors:
Jonas Brehmer
Abstract:
Elicitability is a property of $\mathbb{R}^k$-valued functionals defined on a set of distribution functions. These functionals represent statistical properties of a distribution, for instance its mean, variance, or median. They are called elicitable if there exists a scoring function such that the expected score under a distribution takes its unique minimum at the functional value of this distribu…
▽ More
Elicitability is a property of $\mathbb{R}^k$-valued functionals defined on a set of distribution functions. These functionals represent statistical properties of a distribution, for instance its mean, variance, or median. They are called elicitable if there exists a scoring function such that the expected score under a distribution takes its unique minimum at the functional value of this distribution. If such a scoring function exists, it is called strictly consistent for the functional. Motivated by the recent findings of Fissler and Ziegel concerning higher order elicitability, this thesis reviews the most important results, examples, and applications which are found in the relevant literature. Moreover, we also contribute our own examples and findings in order to give the reader a well-founded overview of the topic as well as of the most used tools and techniques. We include necessary and sufficient conditions for strictly consistent scoring functions, several elicitable as well as non-elicitable functionals and the use of elicitability in forecast comparison, regression, and estimation. Special emphasis is placed on quantitative risk management and the result that Value at Risk and Expected Shortfall are jointly elicitable.
△ Less
Submitted 30 July, 2017;
originally announced July 2017.