Search | arXiv e-print repository

Expectiles as basis risk-optimal payment schemes in parametric insurance

Authors: Markus Johannes Maier, Matthias Scherer

Abstract: Payments in parametric insurance solutions are linked to an index and thus decoupled from policyholders' true losses. While this principle has appealing operational benefits compared to traditional indemnity coverage, i.e. is very efficient and cost effective, a downside is the discrepancy between payouts and actual damage, called basis risk. We show that in an asymmetrically weighted mean square… ▽ More Payments in parametric insurance solutions are linked to an index and thus decoupled from policyholders' true losses. While this principle has appealing operational benefits compared to traditional indemnity coverage, i.e. is very efficient and cost effective, a downside is the discrepancy between payouts and actual damage, called basis risk. We show that in an asymmetrically weighted mean square error framework, the basis risk-minimizing payment schemes for pure parametric and parametric index insurance contracts can be expressed as conditional expectiles of policyholders' true loss given a compensation-triggering incident. We provide connections to stochastic orderings and demonstrate that regression approaches allow easy implementation in practice. Our results are visualized in parametric coverage for cyber risks and agricultural insurance. △ Less

Submitted 5 May, 2025; originally announced May 2025.

Comments: 34 pages, 8 figures

MSC Class: 91G05

arXiv:2208.12334 [pdf, other]

doi 10.1002/jrsm.1703

Footprint of publication selection bias on meta-analyses in medicine, environmental sciences, psychology, and economics

Authors: František Bartoš, Maximilian Maier, Eric-Jan Wagenmakers, Franziska Nippold, Hristos Doucouliagos, John P. A. Ioannidis, Willem M. Otte, Martina Sladekova, Teshome K. Deresssa, Stephan B. Bruns, Daniele Fanelli, T. D. Stanley

Abstract: Publication selection bias undermines the systematic accumulation of evidence. To assess the extent of this problem, we survey over 68,000 meta-analyses containing over 700,000 effect size estimates from medicine (67,386/597,699), environmental sciences (199/12,707), psychology (605/23,563), and economics (327/91,421). Our results indicate that meta-analyses in economics are the most severely cont… ▽ More Publication selection bias undermines the systematic accumulation of evidence. To assess the extent of this problem, we survey over 68,000 meta-analyses containing over 700,000 effect size estimates from medicine (67,386/597,699), environmental sciences (199/12,707), psychology (605/23,563), and economics (327/91,421). Our results indicate that meta-analyses in economics are the most severely contaminated by publication selection bias, closely followed by meta-analyses in environmental sciences and psychology, whereas meta-analyses in medicine are contaminated the least. After adjusting for publication selection bias, the median probability of the presence of an effect decreased from 99.9% to 29.7% in economics, from 98.9% to 55.7% in psychology, from 99.8% to 70.7% in environmental sciences, and from 38.0% to 29.7% in medicine. The median absolute effect sizes (in terms of standardized mean differences) decreased from d = 0.20 to d = 0.07 in economics, from d = 0.37 to d = 0.26 in psychology, from d = 0.62 to d = 0.43 in environmental sciences, and from d = 0.24 to d = 0.13 in medicine. △ Less

Submitted 26 September, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

arXiv:2103.05841 [pdf, other]

Interpretable bias mitigation for textual data: Reducing gender bias in patient notes while maintaining classification performance

Authors: Joshua R. Minot, Nicholas Cheney, Marc Maier, Danne C. Elbers, Christopher M. Danforth, Peter Sheridan Dodds

Abstract: Medical systems in general, and patient treatment decisions and outcomes in particular, are affected by bias based on gender and other demographic elements. As language models are increasingly applied to medicine, there is a growing interest in building algorithmic fairness into processes impacting patient care. Much of the work addressing this question has focused on biases encoded in language mo… ▽ More Medical systems in general, and patient treatment decisions and outcomes in particular, are affected by bias based on gender and other demographic elements. As language models are increasingly applied to medicine, there is a growing interest in building algorithmic fairness into processes impacting patient care. Much of the work addressing this question has focused on biases encoded in language models -- statistical estimates of the relationships between concepts derived from distant reading of corpora. Building on this work, we investigate how word choices made by healthcare practitioners and language models interact with regards to bias. We identify and remove gendered language from two clinical-note datasets and describe a new debiasing procedure using BERT-based gender classifiers. We show minimal degradation in health condition classification tasks for low- to medium-levels of bias removal via data augmentation. Finally, we compare the bias semantically encoded in the language models with the bias empirically observed in health records. This work outlines an interpretable approach for using data augmentation to identify and reduce the potential for bias in natural language processing pipelines. △ Less

Submitted 9 March, 2021; originally announced March 2021.

Comments: 31 pages, 22 figures

arXiv:1904.02052 [pdf, other]

doi 10.5194/isprs-annals-IV-2-W5-609-2019

Estimating Chlorophyll a Concentrations of Several Inland Waters with Hyperspectral Data and Machine Learning Models

Authors: Philipp M. Maier, Sina Keller

Abstract: Water is a key component of life, the natural environment and human health. For monitoring the conditions of a water body, the chlorophyll a concentration can serve as a proxy for nutrients and oxygen supply. In situ measurements of water quality parameters are often time-consuming, expensive and limited in areal validity. Therefore, we apply remote sensing techniques. During field campaigns, we c… ▽ More Water is a key component of life, the natural environment and human health. For monitoring the conditions of a water body, the chlorophyll a concentration can serve as a proxy for nutrients and oxygen supply. In situ measurements of water quality parameters are often time-consuming, expensive and limited in areal validity. Therefore, we apply remote sensing techniques. During field campaigns, we collected hyperspectral data with a spectrometer and in situ measured chlorophyll a concentrations of 13 inland water bodies with different spectral characteristics. One objective of this study is to estimate chlorophyll a concentrations of these inland waters by applying three machine learning regression models: Random Forest, Support Vector Machine and an Artificial Neural Network. Additionally, we simulate four different hyperspectral resolutions of the spectrometer data to investigate the effects on the estimation performance. Furthermore, the application of first order derivatives of the spectra is evaluated in turn to the regression performance. This study reveals the potential of combining machine learning approaches and remote sensing data for inland waters. Each machine learning model achieves an R2-score between 80 % to 90 % for the regression on chlorophyll a concentrations. The random forest model benefits clearly from the applied derivatives of the spectra. In further studies, we will focus on the application of machine learning models on spectral satellite data to enhance the area-wide estimation of chlorophyll a concentration for inland waters. △ Less

Submitted 3 April, 2019; originally announced April 2019.

Comments: Accepted at ISPRS Geospatial Week 2019 in Enschede

arXiv:1805.01361 [pdf, other]

doi 10.1109/WHISPERS.2018.8747010

Machine learning regression on hyperspectral data to estimate multiple water parameters

Authors: Philipp M. Maier, Sina Keller

Abstract: In this paper, we present a regression framework involving several machine learning models to estimate water parameters based on hyperspectral data. Measurements from a multi-sensor field campaign, conducted on the River Elbe, Germany, represent the benchmark dataset. It contains hyperspectral data and the five water parameters chlorophyll a, green algae, diatoms, CDOM and turbidity. We apply a PC… ▽ More In this paper, we present a regression framework involving several machine learning models to estimate water parameters based on hyperspectral data. Measurements from a multi-sensor field campaign, conducted on the River Elbe, Germany, represent the benchmark dataset. It contains hyperspectral data and the five water parameters chlorophyll a, green algae, diatoms, CDOM and turbidity. We apply a PCA for the high-dimensional data as a possible preprocessing step. Then, we evaluate the performance of the regression framework with and without this preprocessing step. The regression results of the framework clearly reveal the potential of estimating water parameters based on hyperspectral data with machine learning. The proposed framework provides the basis for further investigations, such as adapting the framework to estimate water parameters of different inland waters. △ Less

Submitted 7 August, 2018; v1 submitted 3 May, 2018; originally announced May 2018.

arXiv:1804.09046 [pdf, other]

doi 10.5194/isprs-annals-IV-1-101-2018

Developing a machine learning framework for estimating soil moisture with VNIR hyperspectral data

Authors: Sina Keller, Felix M. Riese, Johanna Stötzer, Philipp M. Maier, Stefan Hinz

Abstract: In this paper, we investigate the potential of estimating the soil-moisture content based on VNIR hyperspectral data combined with LWIR data. Measurements from a multi-sensor field campaign represent the benchmark dataset which contains measured hyperspectral, LWIR, and soil-moisture data conducted on grassland site. We introduce a regression framework with three steps consisting of feature select… ▽ More In this paper, we investigate the potential of estimating the soil-moisture content based on VNIR hyperspectral data combined with LWIR data. Measurements from a multi-sensor field campaign represent the benchmark dataset which contains measured hyperspectral, LWIR, and soil-moisture data conducted on grassland site. We introduce a regression framework with three steps consisting of feature selection, preprocessing, and well-chosen regression models. The latter are mainly supervised machine learning models. An exception are the self-organizing maps which combine unsupervised and supervised learning. We analyze the impact of the distinct preprocessing methods on the regression results. Of all regression models, the extremely randomized trees model without preprocessing provides the best estimation performance. Our results reveal the potential of the respective regression framework combined with the VNIR hyperspectral data to estimate soil moisture measured under real-world conditions. In conclusion, the results of this paper provide a basis for further improvements in different research directions. △ Less

Submitted 12 July, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

Comments: Accepted at ISPRS TC I Midterm Symposium Karlsruhe (October 2018)

Journal ref: ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., IV-1, 101-108, 2018

arXiv:1102.2075 [pdf, other]

How the result of graph clustering methods depends on the construction of the graph

Authors: Markus Maier, Ulrike von Luxburg, Matthias Hein

Abstract: We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the out… ▽ More We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the outcome of the final clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to infinity. It turns out that the limit values of the same objective function are systematically different on different types of graphs. This implies that clustering results systematically depend on the graph and can be very different for different types of graph. We provide examples to illustrate the implications on spectral clustering. △ Less

Submitted 10 February, 2011; originally announced February 2011.

arXiv:0912.3408 [pdf, other]

doi 10.1016/j.tcs.2009.01.009

Optimal construction of k-nearest neighbor graphs for identifying noisy clusters

Authors: Markus Maier, Matthias Hein, Ulrike von Luxburg

Abstract: We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual k-nearest neighbor or symmetric k-nearest neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected… ▽ More We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual k-nearest neighbor or symmetric k-nearest neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected components of the t-level set of the underlying probability distribution. Clusters are said to be identified in the neighborhood graph if connected components in the graph correspond to the true underlying clusters. Using techniques from random geometric graph theory, we prove bounds on the probability that clusters are identified successfully, both in a noise-free and in a noisy setting. Those bounds lead to several conclusions. First, k has to be chosen surprisingly high (rather of the order n than of the order log n) to maximize the probability of cluster identification. Secondly, the major difference between the mutual and the symmetric k-nearest neighbor graph occurs when one attempts to detect the most significant cluster only. △ Less

Submitted 17 December, 2009; originally announced December 2009.

Comments: 31 pages, 2 figures

Journal ref: Theoretical Computer Science, 410(19), 1749-1764, April 2009

Showing 1–8 of 8 results for author: Maier, M