-
Metrics for Learning in Topological Persistence
Authors:
Henri Riihimäki,
José Licón-Saláiz
Abstract:
Persistent homology analysis provides means to capture the connectivity structure of data sets in various dimensions. On the mathematical level, by defining a metric between the objects that persistence attaches to data sets, we can stabilize invariants characterizing these objects. We outline how so called contour functions induce relevant metrics for stabilizing the rank invariant. On the practi…
▽ More
Persistent homology analysis provides means to capture the connectivity structure of data sets in various dimensions. On the mathematical level, by defining a metric between the objects that persistence attaches to data sets, we can stabilize invariants characterizing these objects. We outline how so called contour functions induce relevant metrics for stabilizing the rank invariant. On the practical level, the stable ranks are used as fingerprints for data. Different choices of contour lead to different stable ranks and the topological learning is then the question of finding the optimal contour. We outline our analysis pipeline and show how it can enhance classification of physical activities data. As our main application we study how stable ranks and contours provide robust descriptors of spatial patterns of atmospheric cloud fields.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
A topological data analysis based classification method for multiple measurements
Authors:
Henri Riihimäki,
Wojciech Chachólski,
Jakob Theorell,
Jan Hillert,
Ryan Ramanujam
Abstract:
Machine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. When applying this to two case studies, accuracy exceeds alternative models with additional benefits such as reporting data subsets with high purity along with…
▽ More
Machine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. When applying this to two case studies, accuracy exceeds alternative models with additional benefits such as reporting data subsets with high purity along with feature values. For 300 examples of 3 tree species, the accuracy reached 80% after 30 datapoints, which was improved to 90% after increased sampling to 400 datapoints. Using data from 100 examples of each of 6 point processes, the classifier achieved 96.8% accuracy. In both datasets, the TDA classifier outperformed an alternative model. This algorithm and software can be beneficial for repeated measurement data common in biological sciences, as both an accurate classifier and a feature selection tool.
△ Less
Submitted 5 April, 2019;
originally announced April 2019.
-
Generalized persistence analysis based on stable rank invariant
Authors:
Henri Riihimäki,
Wojciech Chacholski
Abstract:
We believe three ingredients are needed for further progress in persistence and its use: invariants not relying on decomposition theorems to go beyond 1-dimension, outcomes suitable for statistical analysis and a setup adopted for supervised and machine learning. Stable rank, a continuous invariant for multidimensional persistence, was introduced in W. Chacholski et al. - Multidimensional persiste…
▽ More
We believe three ingredients are needed for further progress in persistence and its use: invariants not relying on decomposition theorems to go beyond 1-dimension, outcomes suitable for statistical analysis and a setup adopted for supervised and machine learning. Stable rank, a continuous invariant for multidimensional persistence, was introduced in W. Chacholski et al. - Multidimensional persistence and noise, 2017. In the current paper we continue this work by demonstrating how one builds an efficient computational pipeline around this invariant and uses it in inference in case of one parameter. We demonstrate some computational evidence of the statistical stability of stable rank. We also show how our framework can be used in supervised learning.
△ Less
Submitted 13 June, 2018;
originally announced July 2018.