Noise robustness of persistent homology on greyscale images, across filtrations and signatures
Authors:
Renata Turkeš,
Jannes Nys,
Tim Verdonck,
Steven Latré
Abstract:
Topological data analysis is a recent and fast growing field that approaches the analysis of datasets using techniques from (algebraic) topology. Its main tool, persistent homology (PH), has seen a notable increase in applications in the last decade. Often cited as the most favourable property of PH and the main reason for practical success are the stability theorems that give theoretical results…
▽ More
Topological data analysis is a recent and fast growing field that approaches the analysis of datasets using techniques from (algebraic) topology. Its main tool, persistent homology (PH), has seen a notable increase in applications in the last decade. Often cited as the most favourable property of PH and the main reason for practical success are the stability theorems that give theoretical results about noise robustness, since real data is typically contaminated with noise or measurement errors. However, little attention has been paid to what these stability theorems mean in practice. To gain some insight into this question, we evaluate the noise robustness of PH on the MNIST dataset of greyscale images. More precisely, we investigate to what extent PH changes under typical forms of image noise, and quantify the loss of performance in classifying the MNIST handwritten digits when noise is added to the data. The results show that the sensitivity to noise of PH is influenced by the choice of filtrations and persistence signatures (respectively the input and output of PH), and in particular, that PH features are often not robust to noise in a classification task.
△ Less
Submitted 17 August, 2021; v1 submitted 16 August, 2021;
originally announced August 2021.
Concordance probability in a big data setting: application in non-life insurance
Authors:
Robin Van Oirbeek,
Christopher Grumiau,
Tim Verdonck
Abstract:
The concordance probability or C-index is a popular measure to capture the discriminatory ability of a regression model. In this article, the definition of this measure is adapted to the specific needs of the frequency and severity model, typically used during the technical pricing of a non-life insurance product. Due to the typical large sample size of the frequency data in particular, two differ…
▽ More
The concordance probability or C-index is a popular measure to capture the discriminatory ability of a regression model. In this article, the definition of this measure is adapted to the specific needs of the frequency and severity model, typically used during the technical pricing of a non-life insurance product. Due to the typical large sample size of the frequency data in particular, two different adaptations of the estimation procedure of the concordance probability are presented. Note that the latter procedures can be applied to all different versions of the concordance probability.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.