-
Counts and end-curves in two-parameter persistence
Authors:
Thomas Brüstle,
Steve Oudot,
Luis Scoccola,
Hugh Thomas
Abstract:
Given a finite dimensional, bigraded module over the polynomial ring in two variables, we define its two-parameter count, a natural number, and its end-curves, a set of plane curves. These are two-dimensional analogues of the notions of bar-count and endpoints of singly-graded modules over the polynomial ring in one variable, from persistence theory. We show that our count is the unique one satisf…
▽ More
Given a finite dimensional, bigraded module over the polynomial ring in two variables, we define its two-parameter count, a natural number, and its end-curves, a set of plane curves. These are two-dimensional analogues of the notions of bar-count and endpoints of singly-graded modules over the polynomial ring in one variable, from persistence theory. We show that our count is the unique one satisfying certain natural conditions; as a consequence, several inclusion-exclusion-type formulas in two-parameter persistence yield the same positive number, which equals our count, and which in turn equals the number of end-curves, giving geometric meaning to this count. We show that the end-curves determine the classical Betti tables by showing that they interpolate between generators, relations, and syzygies. Using the band representations of a certain string algebra, we show that the set of end-curves admits a canonical partition, where each part forms a closed curve on the plane; we call this the boundary of the module. As an invariant, the boundary is neither weaker nor stronger than the rank invariant, but, in contrast to the rank invariant, it is a complete invariant on the set of spread-decomposable representations. Our results connect several lines of work in multiparameter persistence, and their extension to modules over the real-exponent polynomial ring in two variables relates to two-dimensional Morse theory.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Cover Learning for Large-Scale Topology Representation
Authors:
Luis Scoccola,
Uzu Lim,
Heather A. Harrington
Abstract:
Classical unsupervised learning methods like clustering and linear dimensionality reduction parametrize large-scale geometry when it is discrete or linear, while more modern methods from manifold learning find low dimensional representation or infer local geometry by constructing a graph on the input data. More recently, topological data analysis popularized the use of simplicial complexes to repr…
▽ More
Classical unsupervised learning methods like clustering and linear dimensionality reduction parametrize large-scale geometry when it is discrete or linear, while more modern methods from manifold learning find low dimensional representation or infer local geometry by constructing a graph on the input data. More recently, topological data analysis popularized the use of simplicial complexes to represent data topology with two main methodologies: topological inference with geometric complexes and large-scale topology visualization with Mapper graphs -- central to these is the nerve construction from topology, which builds a simplicial complex given a cover of a space by subsets. While successful, these have limitations: geometric complexes scale poorly with data size, and Mapper graphs can be hard to tune and only contain low dimensional information. In this paper, we propose to study the problem of learning covers in its own right, and from the perspective of optimization. We describe a method for learning topologically-faithful covers of geometric datasets, and show that the simplicial complexes thus obtained can outperform standard topological inference approaches in terms of size, and Mapper-type algorithms in terms of representation of large-scale topology.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Decomposing zero-dimensional persistent homology over rooted tree quivers
Authors:
Riju Bindua,
Thomas Brüstle,
Luis Scoccola
Abstract:
Given a functor from any category into the category of topological spaces, one obtains a linear representation of the category by post-composing the given functor with a homology functor with field coefficients. This construction is fundamental in persistence theory, where it is known as persistent homology, and where the category is typically a poset. Persistence theory is particularly successful…
▽ More
Given a functor from any category into the category of topological spaces, one obtains a linear representation of the category by post-composing the given functor with a homology functor with field coefficients. This construction is fundamental in persistence theory, where it is known as persistent homology, and where the category is typically a poset. Persistence theory is particularly successful when the poset is a finite linearly ordered set, owing to the fact that in this case its category of representations is of finite type. We show that when the poset is a rooted tree poset (a poset with a maximum and whose Hasse diagram is a tree) the additive closure of the category of representations obtainable as zero-dimensional persistent homology is of finite type, and give a quadratic-time algorithm for decomposition into indecomposables. In doing this, we give an algebraic characterization of the additive closure in terms of Ringel's tree modules, and show that its indecomposable objects are the reduced representations of Kinser.
△ Less
Submitted 28 November, 2024;
originally announced November 2024.
-
Computing Betti tables and minimal presentations of zero-dimensional persistent homology
Authors:
Dmitriy Morozov,
Luis Scoccola
Abstract:
The Betti tables of a multigraded module encode the grades at which there is an algebraic change in the module. Multigraded modules show up in many areas of pure and applied mathematics, and in particular in topological data analysis, where they are known as persistence modules, and where their Betti tables describe the places at which the homology of filtered simplicial complexes change. Although…
▽ More
The Betti tables of a multigraded module encode the grades at which there is an algebraic change in the module. Multigraded modules show up in many areas of pure and applied mathematics, and in particular in topological data analysis, where they are known as persistence modules, and where their Betti tables describe the places at which the homology of filtered simplicial complexes change. Although Betti tables of singly and bigraded modules are already being used in applications of topological data analysis, their computation in the bigraded case (which relies on an algorithm that is cubic in the size of the filtered simplicial complex) is a bottleneck when working with large datasets. We show that, in the special case of $0$-dimensional homology (which is relevant for clustering and graph classification) the Betti tables of a bigraded module can be computed in log-linear time. We also consider the problem of computing minimal presentations, and show that a minimal presentation of $0$-dimensional persistent homology can be computed in quadratic time, regardless of the grading poset.
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
Differentiability and Optimization of Multiparameter Persistent Homology
Authors:
Luis Scoccola,
Siddharth Setlur,
David Loiseaux,
Mathieu Carrière,
Steve Oudot
Abstract:
Real-valued functions on geometric data -- such as node attributes on a graph -- can be optimized using descriptors from persistent homology, allowing the user to incorporate topological terms in the loss function. When optimizing a single real-valued function (the one-parameter setting), there is a canonical choice of descriptor for persistent homology: the barcode. The operation mapping a real-v…
▽ More
Real-valued functions on geometric data -- such as node attributes on a graph -- can be optimized using descriptors from persistent homology, allowing the user to incorporate topological terms in the loss function. When optimizing a single real-valued function (the one-parameter setting), there is a canonical choice of descriptor for persistent homology: the barcode. The operation mapping a real-valued function to its barcode is differentiable almost everywhere, and the convergence of gradient descent for losses using barcodes is relatively well understood. When optimizing a vector-valued function (the multiparameter setting), there is no unique choice of descriptor for multiparameter persistent homology, and many distinct descriptors have been proposed. This calls for the development of a general framework for differentiability and optimization that applies to a wide range of multiparameter homological descriptors. In this article, we develop such a framework and show that it encompasses well-known descriptors of different flavors, such as signed barcodes and the multiparameter persistence landscape. We complement the theory with numerical experiments supporting the idea that optimizing multiparameter homological descriptors can lead to improved performances compared to optimizing one-parameter descriptors, even when using the simplest and most efficiently computable multiparameter descriptors.
△ Less
Submitted 30 August, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures
Authors:
David Loiseaux,
Luis Scoccola,
Mathieu Carrière,
Magnus Bakke Botnan,
Steve Oudot
Abstract:
Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array o…
▽ More
Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array of methods enabling the use of one-parameter PH descriptors in data science, which rely on the stable vectorization of these descriptors as elements of a Hilbert space. Although the multiparameter PH (MPH) of data that is filtered by several quantities of interest encodes much richer information than its one-parameter counterpart, the scarceness of stability results for MPH descriptors has so far limited the available options for the stable vectorization of MPH. In this paper, we aim to bring together the best of both worlds by showing how the interpretation of signed barcodes -- a recent family of MPH descriptors -- as signed measures leads to natural extensions of vectorization strategies from one parameter to multiple parameters. The resulting feature vectors are easy to define and to compute, and provably stable. While, as a proof of concept, we focus on simple choices of signed barcodes and vectorizations, we already see notable performance improvements when comparing our feature vectors to state-of-the-art topology-based methods on various types of data.
△ Less
Submitted 7 February, 2024; v1 submitted 6 June, 2023;
originally announced June 2023.
-
Toroidal Coordinates: Decorrelating Circular Coordinates With Lattice Reduction
Authors:
Luis Scoccola,
Hitesh Gakhar,
Johnathan Bush,
Nikolas Schonsheck,
Tatum Rask,
Ling Zhou,
Jose A. Perea
Abstract:
The circular coordinates algorithm of de Silva, Morozov, and Vejdemo-Johansson takes as input a dataset together with a cohomology class representing a $1$-dimensional hole in the data; the output is a map from the data into the circle that captures this hole, and that is of minimum energy in a suitable sense. However, when applied to several cohomology classes, the output circle-valued maps can b…
▽ More
The circular coordinates algorithm of de Silva, Morozov, and Vejdemo-Johansson takes as input a dataset together with a cohomology class representing a $1$-dimensional hole in the data; the output is a map from the data into the circle that captures this hole, and that is of minimum energy in a suitable sense. However, when applied to several cohomology classes, the output circle-valued maps can be "geometrically correlated" even if the chosen cohomology classes are linearly independent. It is shown in the original work that less correlated maps can be obtained with suitable integer linear combinations of the cohomology classes, with the linear combinations being chosen by inspection. In this paper, we identify a formal notion of geometric correlation between circle-valued maps which, in the Riemannian manifold case, corresponds to the Dirichlet form, a bilinear form derived from the Dirichlet energy. We describe a systematic procedure for constructing low energy torus-valued maps on data, starting from a set of linearly independent cohomology classes. We showcase our procedure with computational examples. Our main algorithm is based on the Lenstra--Lenstra--Lovász algorithm from computational number theory.
△ Less
Submitted 15 March, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
On the bottleneck stability of rank decompositions of multi-parameter persistence modules
Authors:
Magnus Bakke Botnan,
Steffen Oppermann,
Steve Oudot,
Luis Scoccola
Abstract:
A significant part of modern topological data analysis is concerned with the design and study of algebraic invariants of poset representations -- often referred to as multi-parameter persistence modules. One such invariant is the minimal rank decomposition, which encodes the ranks of all the structure morphisms of the persistence module by a single ordered pair of rectangle-decomposable modules, i…
▽ More
A significant part of modern topological data analysis is concerned with the design and study of algebraic invariants of poset representations -- often referred to as multi-parameter persistence modules. One such invariant is the minimal rank decomposition, which encodes the ranks of all the structure morphisms of the persistence module by a single ordered pair of rectangle-decomposable modules, interpreted as a signed barcode. This signed barcode generalizes the concept of persistence barcode from one-parameter persistence to any number of parameters, raising the question of its bottleneck stability. We show in this paper that the minimal rank decomposition is not stable under the natural notion of signed bottleneck matching between signed barcodes. We remedy this by turning our focus to the rank exact decomposition, a related signed barcode induced by the minimal projective resolution of the module relative to the so-called rank exact structure, which we prove to be bottleneck stable under signed matchings. As part of our proof, we obtain two intermediate results of independent interest: we compute the global dimension of the rank exact structure on the category of finitely presentable multi-parameter persistence modules, and we prove a bottleneck stability result for hook-decomposable modules. We also give a bound for the size of the rank exact decomposition that is polynomial in the size of the usual minimal projective resolution, we prove a universality result for the dissimilarity function induced by the notion of signed matching, and we compute, in the two-parameter case, the global dimension of a different exact structure related to the upsets of the indexing poset. This set of results combines concepts from topological data analysis and from the representation theory of posets, and we believe is relevant to both areas.
△ Less
Submitted 30 August, 2024; v1 submitted 30 July, 2022;
originally announced August 2022.
-
FibeRed: Fiberwise Dimensionality Reduction of Topologically Complex Data with Vector Bundles
Authors:
Luis Scoccola,
Jose A. Perea
Abstract:
Datasets with non-trivial large scale topology can be hard to embed in low-dimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the…
▽ More
Datasets with non-trivial large scale topology can be hard to embed in low-dimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the fibers, while preserving the large scale topology. We formalize this point of view, and, as an application, we describe an algorithm which takes as input a dataset together with an initial representation of it in Euclidean space, assumed to recover part of its large scale topology, and outputs a new representation that integrates local representations, obtained through local linear dimensionality reduction, along the initial global representation. We demonstrate this algorithm on examples coming from dynamical systems and chemistry. In these examples, our algorithm is able to learn topologically faithful embeddings of the data in lower target dimension than various well known metric-based dimensionality reduction algorithms.
△ Less
Submitted 15 March, 2023; v1 submitted 13 June, 2022;
originally announced June 2022.
-
On the stability of multigraded Betti numbers and Hilbert functions
Authors:
Steve Oudot,
Luis Scoccola
Abstract:
Multigraded Betti numbers are one of the simplest invariants of multiparameter persistence modules. This invariant is useful in theory -- it completely determines the Hilbert function of the module and the isomorphism type of the free modules in its minimal free resolution -- as well as in practice -- it is easy to visualize and it is one of the main outputs of current multiparameter persistent ho…
▽ More
Multigraded Betti numbers are one of the simplest invariants of multiparameter persistence modules. This invariant is useful in theory -- it completely determines the Hilbert function of the module and the isomorphism type of the free modules in its minimal free resolution -- as well as in practice -- it is easy to visualize and it is one of the main outputs of current multiparameter persistent homology software, such as RIVET. However, to the best of our knowledge, no bottleneck stability result with respect to the interleaving distance has been established for this invariant so far, and this potential lack of stability limits its practical applications. We prove a stability result for multigraded Betti numbers, using an efficiently computable bottleneck-type dissimilarity function we introduce. Our notion of matching is inspired by recent work on signed barcodes, and allows matching bars of the same module in homological degrees of different parity, in addition to matchings bars of different modules in homological degrees of the same parity. Our stability result is a combination of Hilbert's syzygy theorem, Bjerkevik's bottleneck stability for free modules, and a novel stability result for projective resolutions. We also prove, in the $2$-parameter case, a $1$-Wasserstein stability result for Hilbert functions with respect to the $1$-presentation distance of Bjerkevik and Lesnick.
△ Less
Submitted 7 February, 2024; v1 submitted 22 December, 2021;
originally announced December 2021.
-
Approximate and discrete Euclidean vector bundles
Authors:
Luis Scoccola,
Jose A. Perea
Abstract:
We introduce $\varepsilon$-approximate versions of the notion of Euclidean vector bundle for $\varepsilon \geq 0$, which recover the classical notion of Euclidean vector bundle when $\varepsilon = 0$. In particular, we study Čech cochains with coefficients in the orthogonal group that satisfy an approximate cocycle condition. We show that $\varepsilon$-approximate vector bundles can be used to rep…
▽ More
We introduce $\varepsilon$-approximate versions of the notion of Euclidean vector bundle for $\varepsilon \geq 0$, which recover the classical notion of Euclidean vector bundle when $\varepsilon = 0$. In particular, we study Čech cochains with coefficients in the orthogonal group that satisfy an approximate cocycle condition. We show that $\varepsilon$-approximate vector bundles can be used to represent classical vector bundles when $\varepsilon > 0$ is sufficiently small. We also introduce distances between approximate vector bundles and use them to prove that sufficiently similar approximate vector bundles represent the same classical vector bundle. This gives a way of specifying vector bundles over finite simplicial complexes using a finite amount of data, and also allows for some tolerance to noise when working with vector bundles in an applied setting. As an example, we prove a reconstruction theorem for vector bundles from finite samples. We give algorithms for the effective computation of low-dimensional characteristic classes of vector bundles directly from discrete and approximate representations and illustrate the usage of these algorithms with computational examples.
△ Less
Submitted 7 February, 2024; v1 submitted 15 April, 2021;
originally announced April 2021.
-
The Integers as a Higher Inductive Type
Authors:
Thorsten Altenkirch,
Luis Scoccola
Abstract:
We consider the problem of defining the integers in Homotopy Type Theory (HoTT). We can define the type of integers as signed natural numbers (i.e., using a coproduct), but its induction principle is very inconvenient to work with, since it leads to an explosion of cases. An alternative is to use set-quotients, but here we need to use set-truncation to avoid non-trivial higher equalities. This res…
▽ More
We consider the problem of defining the integers in Homotopy Type Theory (HoTT). We can define the type of integers as signed natural numbers (i.e., using a coproduct), but its induction principle is very inconvenient to work with, since it leads to an explosion of cases. An alternative is to use set-quotients, but here we need to use set-truncation to avoid non-trivial higher equalities. This results in a recursion principle that only allows us to define function into sets (types satisfying UIP). In this paper we consider higher inductive types using either a small universe or bi-invertible maps. These types represent integers without explicit set-truncation that are equivalent to the usual coproduct representation. This is an interesting example since it shows how some coherence problems can be handled in HoTT. We discuss some open questions triggered by this work. The proofs have been formally verified using cubical Agda.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
Stable and consistent density-based clustering via multiparameter persistence
Authors:
Alexander Rolle,
Luis Scoccola
Abstract:
We consider the degree-Rips construction from topological data analysis, which provides a density-sensitive, multiparameter hierarchical clustering algorithm. We analyze its stability to perturbations of the input data using the correspondence-interleaving distance, a metric for hierarchical clusterings that we introduce. Taking certain one-parameter slices of degree-Rips recovers well-known metho…
▽ More
We consider the degree-Rips construction from topological data analysis, which provides a density-sensitive, multiparameter hierarchical clustering algorithm. We analyze its stability to perturbations of the input data using the correspondence-interleaving distance, a metric for hierarchical clusterings that we introduce. Taking certain one-parameter slices of degree-Rips recovers well-known methods for density-based clustering, but we show that these methods are unstable. However, we prove that degree-Rips, as a multiparameter object, is stable, and we propose an alternative approach for taking slices of degree-Rips, which yields a one-parameter hierarchical clustering algorithm with better stability properties. We prove that this algorithm is consistent, using the correspondence-interleaving distance. We provide an algorithm for extracting a single clustering from one-parameter hierarchical clusterings, which is stable with respect to the correspondence-interleaving distance. And, we integrate these methods into a pipeline for density-based clustering, which we call Persistable. Adapting tools from multiparameter persistent homology, we propose visualization tools that guide the selection of all parameters of the pipeline. We demonstrate Persistable on benchmark datasets, showing that it identifies multi-scale cluster structure in data.
△ Less
Submitted 3 August, 2023; v1 submitted 18 May, 2020;
originally announced May 2020.
-
Visualization tools for parameter selection in cluster analysis
Authors:
Alexander Rolle,
Luis Scoccola
Abstract:
We propose an algorithm, HPREF (Hierarchical Partitioning by Repeated Features), that produces a hierarchical partition of a set of clusterings of a fixed dataset, such as sets of clusterings produced by running a clustering algorithm with a range of parameters. This gives geometric structure to such sets of clustering, and can be used to visualize the set of results one obtains by running a clust…
▽ More
We propose an algorithm, HPREF (Hierarchical Partitioning by Repeated Features), that produces a hierarchical partition of a set of clusterings of a fixed dataset, such as sets of clusterings produced by running a clustering algorithm with a range of parameters. This gives geometric structure to such sets of clustering, and can be used to visualize the set of results one obtains by running a clustering algorithm with a range of parameters.
△ Less
Submitted 28 September, 2019; v1 submitted 4 February, 2019;
originally announced February 2019.