Search | arXiv e-print repository

Computing Projective Implicit Representations from Poset Towers

Abstract: A family of simplicial complexes, connected with simplicial maps and indexed by a poset $P$, is called a poset tower. The concept of poset towers subsumes classical objects of study in the persistence literature, as, for example, one-critical multi-filtrations and zigzag filtrations, but also allows multi-critical simplices and arbitrary simplicial maps. The homology of a poset tower gives rise to… ▽ More A family of simplicial complexes, connected with simplicial maps and indexed by a poset $P$, is called a poset tower. The concept of poset towers subsumes classical objects of study in the persistence literature, as, for example, one-critical multi-filtrations and zigzag filtrations, but also allows multi-critical simplices and arbitrary simplicial maps. The homology of a poset tower gives rise to a $P$-persistence module. To compute this homology globally over $P$, in the spirit of the persistence algorithm, we consider the homology of a chain complex of $P$-persistence modules, $C_{\ell-1}\xleftarrow{}C_\ell\xleftarrow{}C_{\ell+1}$, induced by the simplices of the poset tower. Contrary to the case of one-critical filtrations, the chain-modules $C_\ell$ of a poset tower can have a complicated structure. In this work, we tackle the problem of computing a representation of such a chain complex segment by projective modules and $P$-graded matrices, which we call a projective implicit representation (PiRep). We give efficient algorithms to compute asymptotically minimal projective resolutions (up to the second term) of the chain modules and the boundary maps and compute a PiRep from these resolutions. Our algorithms are tailored to the chain complexes and resolutions coming from poset towers and take advantage of their special structure. In the context of poset towers, they are fully general and could potentially serve as a foundation for developing more efficient algorithms on specific posets. △ Less

Submitted 13 May, 2025; originally announced May 2025.

arXiv:2502.19369 [pdf, other]

Computing Connection Matrix and Persistence Efficiently from a Morse Decomposition

Authors: Tamal K. Dey, Michał Lipiński, Andrew Haas

Abstract: Morse decompositions partition the flows in a vector field into equivalent structures. Given such a decomposition, one can define a further summary of its flow structure by what is called a connection matrix.These matrices, a generalization of Morse boundary operators from classical Morse theory, capture the connections made by the flows among the critical structures - such as attractors, repeller… ▽ More Morse decompositions partition the flows in a vector field into equivalent structures. Given such a decomposition, one can define a further summary of its flow structure by what is called a connection matrix.These matrices, a generalization of Morse boundary operators from classical Morse theory, capture the connections made by the flows among the critical structures - such as attractors, repellers, and orbits - in a vector field. Recently, in the context of combinatorial dynamics, an efficient persistence-like algorithm to compute connection matrices has been proposed in~\cite{DLMS24}. We show that, actually, the classical persistence algorithm with exhaustive reduction retrieves connection matrices, both simplifying the algorithm of~\cite{DLMS24} and bringing the theory of persistence closer to combinatorial dynamical systems. We supplement this main result with an observation: the concept of persistence as defined for scalar fields naturally adapts to Morse decompositions whose Morse sets are filtered with a Lyapunov function. We conclude by presenting preliminary experimental results. △ Less

Submitted 26 February, 2025; originally announced February 2025.

arXiv:2502.17704 [pdf, other]

Apex Representatives

Authors: Tamal K. Dey, Tao Hou, Dmitriy Morozov

Abstract: Given a zigzag filtration, we want to find its barcode representatives, i.e., a compatible choice of bases for the homology groups that diagonalize the linear maps in the zigzag. To achieve this, we convert the input zigzag to a levelset zigzag of a real-valued function. This function generates a Mayer-Vietoris pyramid of spaces, which generates an infinite strip of homology groups. We call the or… ▽ More Given a zigzag filtration, we want to find its barcode representatives, i.e., a compatible choice of bases for the homology groups that diagonalize the linear maps in the zigzag. To achieve this, we convert the input zigzag to a levelset zigzag of a real-valued function. This function generates a Mayer-Vietoris pyramid of spaces, which generates an infinite strip of homology groups. We call the origins of indecomposable (diamond) summands of this strip their apexes and give an algorithm to find representative cycles in these apexes from ordinary persistence computation. The resulting representatives map back to the levelset zigzag and thus yield barcode representatives for the input zigzag. Our algorithm for lifting a $p$-dimensional cycle from ordinary persistence to an apex representative takes $O(p \cdot m \log m)$ time. From this we can recover zigzag representatives in time $O(\log m + C)$, where $C$ is the size of the output. △ Less

Submitted 24 February, 2025; originally announced February 2025.

arXiv:2502.16049 [pdf, other]

Quasi Zigzag Persistence: A Topological Framework for Analyzing Time-Varying Data

Authors: Tamal K. Dey, Shreyas N. Samaga

Abstract: In this paper, we propose Quasi Zigzag Persistent Homology (QZPH) as a framework for analyzing time-varying data by integrating multiparameter persistence and zigzag persistence. To this end, we introduce a stable topological invariant that captures both static and dynamic features at different scales. We present an algorithm to compute this invariant efficiently. We show that it enhances the mach… ▽ More In this paper, we propose Quasi Zigzag Persistent Homology (QZPH) as a framework for analyzing time-varying data by integrating multiparameter persistence and zigzag persistence. To this end, we introduce a stable topological invariant that captures both static and dynamic features at different scales. We present an algorithm to compute this invariant efficiently. We show that it enhances the machine learning models when applied to tasks such as sleep-stage detection, demonstrating its effectiveness in capturing the evolving patterns in time-varying datasets. △ Less

Submitted 23 May, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

arXiv:2410.20565 [pdf, other]

A Fast Algorithm for Computing Zigzag Representatives

Authors: Tamal K. Dey, Tao Hou, Dmitriy Morozov

Abstract: Zigzag filtrations of simplicial complexes generalize the usual filtrations by allowing simplex deletions in addition to simplex insertions. The barcodes computed from zigzag filtrations encode the evolution of homological features. Although one can locate a particular feature at any index in the filtration using existing algorithms, the resulting representatives may not be compatible with the zig… ▽ More Zigzag filtrations of simplicial complexes generalize the usual filtrations by allowing simplex deletions in addition to simplex insertions. The barcodes computed from zigzag filtrations encode the evolution of homological features. Although one can locate a particular feature at any index in the filtration using existing algorithms, the resulting representatives may not be compatible with the zigzag: a representative cycle at one index may not map into a representative cycle at its neighbor. For this, one needs to compute compatible representative cycles along each bar in the barcode. Even though it is known that the barcode for a zigzag filtration with $m$ insertions and deletions can be computed in $O(m^ω)$ time, it is not known how to compute the compatible representatives so efficiently. For a non-zigzag filtration, the classical matrix-based algorithm provides representatives in $O(m^3)$ time, which can be improved to $O(m^ω)$. However, no known algorithm for zigzag filtrations computes the representatives with the $O(m^3)$ time bound. We present an $O(m^2n)$ time algorithm for this problem, where $n\leq m$ is the size of the largest complex in the filtration. △ Less

Submitted 27 October, 2024; originally announced October 2024.

arXiv:2406.07100 [pdf, other]

D-GRIL: End-to-End Topological Learning with 2-parameter Persistence

Authors: Soham Mukherjee, Shreyas N. Samaga, Cheng Xin, Steve Oudot, Tamal K. Dey

Abstract: End-to-end topological learning using 1-parameter persistence is well-known. We show that the framework can be enhanced using 2-parameter persistence by adopting a recently introduced 2-parameter persistence based vectorization technique called GRIL. We establish a theoretical foundation of differentiating GRIL producing D-GRIL. We show that D-GRIL can be used to learn a bifiltration function on s… ▽ More End-to-end topological learning using 1-parameter persistence is well-known. We show that the framework can be enhanced using 2-parameter persistence by adopting a recently introduced 2-parameter persistence based vectorization technique called GRIL. We establish a theoretical foundation of differentiating GRIL producing D-GRIL. We show that D-GRIL can be used to learn a bifiltration function on standard benchmark graph datasets. Further, we exhibit that this framework can be applied in the context of bio-activity prediction in drug discovery. △ Less

Submitted 21 February, 2025; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.02732 [pdf, other]

GEFL: Extended Filtration Learning for Graph Classification

Authors: Simon Zhang, Soham Mukherjee, Tamal K. Dey

Abstract: Extended persistence is a technique from topological data analysis to obtain global multiscale topological information from a graph. This includes information about connected components and cycles that are captured by the so-called persistence barcodes. We introduce extended persistence into a supervised learning framework for graph classification. Global topological information, in the form of a… ▽ More Extended persistence is a technique from topological data analysis to obtain global multiscale topological information from a graph. This includes information about connected components and cycles that are captured by the so-called persistence barcodes. We introduce extended persistence into a supervised learning framework for graph classification. Global topological information, in the form of a barcode with four different types of bars and their explicit cycle representatives, is combined into the model by the readout function which is computed by extended persistence. The entire model is end-to-end differentiable. We use a link-cut tree data structure and parallelism to lower the complexity of computing extended persistence, obtaining a speedup of more than 60x over the state-of-the-art for extended persistence computation. This makes extended persistence feasible for machine learning. We show that, under certain conditions, extended persistence surpasses both the WL[1] graph isomorphism test and 0-dimensional barcodes in terms of expressivity because it adds more global (topological) information. In particular, arbitrarily long cycles can be represented, which is difficult for finite receptive field message passing graph neural networks. Furthermore, we show the effectiveness of our method on real world datasets compared to many existing recent graph representation learning methods. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 26 pages, 13 figures, Learning on Graphs Conference (LoG 2022)

arXiv:2403.10958 [pdf, other]

Efficient Algorithms for Complexes of Persistence Modules with Applications

Authors: Tamal K. Dey, Florian Russold, Shreyas N. Samaga

Abstract: We extend the persistence algorithm, viewed as an algorithm computing the homology of a complex of free persistence or graded modules, to complexes of modules that are not free. We replace persistence modules by their presentations and develop an efficient algorithm to compute the homology of a complex of presentations. To deal with inputs that are not given in terms of presentations, we give an e… ▽ More We extend the persistence algorithm, viewed as an algorithm computing the homology of a complex of free persistence or graded modules, to complexes of modules that are not free. We replace persistence modules by their presentations and develop an efficient algorithm to compute the homology of a complex of presentations. To deal with inputs that are not given in terms of presentations, we give an efficient algorithm to compute a presentation of a morphism of persistence modules. This allows us to compute persistent (co)homology of instances giving rise to complexes of non-free modules. Our methods lead to a new efficient algorithm for computing the persistent homology of simplicial towers and they enable efficient algorithms to compute the persistent homology of cosheaves over simplicial towers and cohomology of persistent sheaves on simplicial complexes. We also show that we can compute the cohomology of persistent sheaves over arbitrary finite posets by reducing the computation to a computation over simplicial complexes. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: This is the full version of a paper accepted at the 40th International Symposium on Computational Geometry (SoCG 2024)

arXiv:2403.08110 [pdf, other]

Computing Generalized Ranks of Persistence Modules via Unfolding to Zigzag Modules

Authors: Tamal K. Dey, Cheng Xin

Abstract: For a $P$-indexed persistence module ${\sf M}$, the (generalized) rank of ${\sf M}$ is defined as the rank of the limit-to-colimit map for the diagram of vector spaces of ${\sf M}$ over the poset $P$. For $2$-parameter persistence modules, recently a zigzag persistence based algorithm has been proposed that takes advantage of the fact that generalized rank for $2$-parameter modules is equal to the… ▽ More For a $P$-indexed persistence module ${\sf M}$, the (generalized) rank of ${\sf M}$ is defined as the rank of the limit-to-colimit map for the diagram of vector spaces of ${\sf M}$ over the poset $P$. For $2$-parameter persistence modules, recently a zigzag persistence based algorithm has been proposed that takes advantage of the fact that generalized rank for $2$-parameter modules is equal to the number of full intervals in a zigzag module defined on the boundary of the poset. Analogous definition of boundary for $d$-parameter persistence modules or general $P$-indexed persistence modules does not seem plausible. To overcome this difficulty, we first unfold a given $P$-indexed module ${\sf M}$ into a zigzag module ${\sf M}_{ZZ}$ and then check how many full interval modules in a decomposition of ${\sf M}_{ZZ}$ can be folded back to remain full in a decomposition of ${\sf M}$. This number determines the generalized rank of ${\sf M}$. For special cases of degree-$d$ homology for $d$-complexes, we obtain a more efficient algorithm including a linear time algorithm for degree-$1$ homology in graphs. △ Less

Submitted 18 May, 2025; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.11339 [pdf, other]

Expressive Higher-Order Link Prediction through Hypergraph Symmetry Breaking

Authors: Simon Zhang, Cheng Xin, Tamal K. Dey

Abstract: A hypergraph consists of a set of nodes along with a collection of subsets of the nodes called hyperedges. Higher-order link prediction is the task of predicting the existence of a missing hyperedge in a hypergraph. A hyperedge representation learned for higher order link prediction is fully expressive when it does not lose distinguishing power up to an isomorphism. Many existing hypergraph repres… ▽ More A hypergraph consists of a set of nodes along with a collection of subsets of the nodes called hyperedges. Higher-order link prediction is the task of predicting the existence of a missing hyperedge in a hypergraph. A hyperedge representation learned for higher order link prediction is fully expressive when it does not lose distinguishing power up to an isomorphism. Many existing hypergraph representation learners, are bounded in expressive power by the Generalized Weisfeiler Lehman-1 (GWL-1) algorithm, a generalization of the Weisfeiler Lehman-1 algorithm. However, GWL-1 has limited expressive power. In fact, induced subhypergraphs with identical GWL-1 valued nodes are indistinguishable. Furthermore, message passing on hypergraphs can already be computationally expensive, especially on GPU memory. To address these limitations, we devise a preprocessing algorithm that can identify certain regular subhypergraphs exhibiting symmetry. Our preprocessing algorithm runs once with complexity the size of the input hypergraph. During training, we randomly replace subhypergraphs identified by the algorithm with covering hyperedges to break symmetry. We show that our method improves the expressivity of GWL-1. Our extensive experiments also demonstrate the effectiveness of our approach for higher-order link prediction on both graph and hypergraph datasets with negligible change in computation. △ Less

Submitted 2 December, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

Comments: 64 pages, 8 figures

Journal ref: Published in Transactions on Machine Learning Research (TMLR), 2024

arXiv:2307.07462 [pdf, other]

Computing Zigzag Vineyard Efficiently Including Expansions and Contractions

Authors: Tamal K. Dey, Tao Hou

Abstract: Vines and vineyard connecting a stack of persistence diagrams have been introduced in the non-zigzag setting by Cohen-Steiner et al. We consider computing these vines over changing filtrations for zigzag persistence while incorporating two more operations: expansions and contractions in addition to the transpositions considered in the non-zigzag setting. Although expansions and contractions can be… ▽ More Vines and vineyard connecting a stack of persistence diagrams have been introduced in the non-zigzag setting by Cohen-Steiner et al. We consider computing these vines over changing filtrations for zigzag persistence while incorporating two more operations: expansions and contractions in addition to the transpositions considered in the non-zigzag setting. Although expansions and contractions can be implemented in quadratic time in the non-zigzag case by utilizing the linear-time transpositions, it is not obvious how they can be carried out under the zigzag framework with the same complexity. While transpositions alone can be easily conducted in linear time using the recent FastZigzag algorithm, expansions and contractions pose difficulty in breaking the barrier of cubic complexity. Our main result is that, the half-way constructed up-down filtration in the FastZigzag algorithm indeed can be used to achieve linear time complexity for transpositions and quadratic time complexity for expansions and contractions, matching the time complexity of all corresponding operations in the non-zigzag case. △ Less

Submitted 18 February, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

Comments: Updated funding information for one co-author

arXiv:2304.04970 [pdf, other]

GRIL: A $2$-parameter Persistence Based Vectorization for Machine Learning

Authors: Cheng Xin, Soham Mukherjee, Shreyas N. Samaga, Tamal K. Dey

Abstract: $1$-parameter persistent homology, a cornerstone in Topological Data Analysis (TDA), studies the evolution of topological features such as connected components and cycles hidden in data. It has been applied to enhance the representation power of deep learning models, such as Graph Neural Networks (GNNs). To enrich the representations of topological features, here we propose to study $2… ▽ More $1$-parameter persistent homology, a cornerstone in Topological Data Analysis (TDA), studies the evolution of topological features such as connected components and cycles hidden in data. It has been applied to enhance the representation power of deep learning models, such as Graph Neural Networks (GNNs). To enrich the representations of topological features, here we propose to study $2$-parameter persistence modules induced by bi-filtration functions. In order to incorporate these representations into machine learning models, we introduce a novel vector representation called Generalized Rank Invariant Landscape (GRIL) for $2$-parameter persistence modules. We show that this vector representation is $1$-Lipschitz stable and differentiable with respect to underlying filtration functions and can be easily integrated into machine learning models to augment encoding topological features. We present an algorithm to compute the vector representation efficiently. We also test our methods on synthetic and benchmark graph datasets, and compare the results with previous vector representations of $1$-parameter and $2$-parameter persistence modules. Further, we augment GNNs with GRIL features and observe an increase in performance indicating that GRIL can capture additional features enriching GNNs. We make the complete code for the proposed method available at https://github.com/soham0209/mpml-graph. △ Less

Submitted 30 June, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

arXiv:2303.08270 [pdf, other]

Meta-Diagrams for 2-Parameter Persistence

Authors: Nate Clause, Tamal K. Dey, Facundo Mémoli, Bei Wang

Abstract: We first introduce the notion of meta-rank for a 2-parameter persistence module, an invariant that captures the information behind images of morphisms between 1D slices of the module. We then define the meta-diagram of a 2-parameter persistence module to be the Möbius inversion of the meta-rank, resulting in a function that takes values from signed 1-parameter persistence modules. We show that the… ▽ More We first introduce the notion of meta-rank for a 2-parameter persistence module, an invariant that captures the information behind images of morphisms between 1D slices of the module. We then define the meta-diagram of a 2-parameter persistence module to be the Möbius inversion of the meta-rank, resulting in a function that takes values from signed 1-parameter persistence modules. We show that the meta-rank and meta-diagram contain information equivalent to the rank invariant and the signed barcode. This equivalence leads to computational benefits, as we introduce an algorithm for computing the meta-rank and meta-diagram of a 2-parameter module $M$ indexed by a bifiltration of $n$ simplices in $O(n^3)$ time. This implies an improvement upon the existing algorithm for computing the signed barcode, which has $O(n^4)$ runtime. This also allows us to improve the existing upper bound on the number of rectangles in the rank decomposition of $M$ from $O(n^4)$ to $O(n^3)$. In addition, we define notions of erosion distance between meta-ranks and between meta-diagrams, and show that under these distances, meta-ranks and meta-diagrams are stable with respect to the interleaving distance. Lastly, the meta-diagram can be visualized in an intuitive fashion as a persistence diagram of diagrams, which generalizes the well-understood persistence diagram in the 1-parameter setting. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: 22 pages, 8 figures. Full version of the paper that is to appear in the Proceedings of the 39th International Symposium on Computational Geometry (SoCG 2023)

arXiv:2303.02549 [pdf, other]

Computing Connection Matrices via Persistence-like Reductions

Authors: Tamal K. Dey, Michał Lipiński, Marian Mrozek, Ryan Slechta

Abstract: Connection matrices are a generalization of Morse boundary operators from the classical Morse theory for gradient vector fields. Developing an efficient computational framework for connection matrices is particularly important in the context of a rapidly growing data science that requires new mathematical tools for discrete data. Toward this goal, the classical theory for connection matrices has b… ▽ More Connection matrices are a generalization of Morse boundary operators from the classical Morse theory for gradient vector fields. Developing an efficient computational framework for connection matrices is particularly important in the context of a rapidly growing data science that requires new mathematical tools for discrete data. Toward this goal, the classical theory for connection matrices has been adapted to combinatorial frameworks that facilitate computation. We develop an efficient persistence-like algorithm to compute a connection matrix from a given combinatorial (multi) vector field on a simplicial complex. This algorithm requires a single-pass, improving upon a known algorithm that runs an implicit recursion executing two-passes at each level. Overall, the new algorithm is more simple, direct, and efficient than the state-of-the-art. Because of the algorithm's similarity to the persistence algorithm, one may take advantage of various software optimizations from topological data analysis. △ Less

Submitted 23 September, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

arXiv:2302.12796 [pdf, other]

Revisiting Graph Persistence for Updates and Efficiency

Authors: Tamal K. Dey, Tao Hou, Salman Parsa

Abstract: It is well known that ordinary persistence on graphs can be computed more efficiently than the general persistence. Recently, it has been shown that zigzag persistence on graphs also exhibits similar behavior. Motivated by these results, we revisit graph persistence and propose efficient algorithms especially for local updates on filtrations, similar to what is done in ordinary persistence for com… ▽ More It is well known that ordinary persistence on graphs can be computed more efficiently than the general persistence. Recently, it has been shown that zigzag persistence on graphs also exhibits similar behavior. Motivated by these results, we revisit graph persistence and propose efficient algorithms especially for local updates on filtrations, similar to what is done in ordinary persistence for computing the vineyard. We show that, for a filtration of length $m$, (i) switches (transpositions) in ordinary graph persistence can be done in $O(\log m)$ time; (ii) zigzag persistence on graphs can be computed in $O(m\log m)$ time, which improves a recent $O(m\log^4n)$ time algorithm assuming $n$, the size of the union of all graphs in the filtration, satisfies $n\inΩ({m^\varepsilon})$ for any fixed $0<\varepsilon<1$; (iii) open-closed, closed-open, and closed-closed bars in dimension $0$ for graph zigzag persistence can be updated in $O(\log m)$ time, whereas the open-open bars in dimension $0$ and closed-closed bars in dimension $1$ can be done in $O(\sqrt{m}\,\log m)$ time. △ Less

Submitted 11 May, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

arXiv:2212.01633 [pdf, other]

Cup Product Persistence and Its Efficient Computation

Authors: Tamal K. Dey, Abhishek Rathod

Abstract: It is well-known that the cohomology ring has a richer structure than homology groups. However, until recently, the use of cohomology in persistence setting has been limited to speeding up of barcode computations. Some of the recently introduced invariants, namely, persistent cup-length, persistent cup modules and persistent Steenrod modules, to some extent, fill this gap. When added to the standa… ▽ More It is well-known that the cohomology ring has a richer structure than homology groups. However, until recently, the use of cohomology in persistence setting has been limited to speeding up of barcode computations. Some of the recently introduced invariants, namely, persistent cup-length, persistent cup modules and persistent Steenrod modules, to some extent, fill this gap. When added to the standard persistence barcode, they lead to invariants that are more discriminative than the standard persistence barcode. In this work, we devise an $O(d n^4)$ algorithm for computing the persistent $k$-cup modules for all $k \in \{2, \dots, d\}$, where $d$ denotes the dimension of the filtered complex, and $n$ denotes its size. Moreover, we note that since the persistent cup length can be obtained as a byproduct of our computations, this leads to a faster algorithm for computing it for $d>3$. Finally, we introduce a new stable invariant called partition modules of cup product that is more discriminative than persistent $k$-cup modules and devise an $O(c(d)n^4)$ algorithm for computing it, where $c(d)$ is subexponential in $d$. △ Less

Submitted 17 March, 2024; v1 submitted 3 December, 2022; originally announced December 2022.

Comments: To appear in Proceedings of 40th International Symposium on Computational Geometry

arXiv:2207.14358 [pdf, other]

Topological structure of complex predictions

Authors: Meng Liu, Tamal K. Dey, David F. Gleich

Abstract: Complex prediction models such as deep learning are the output from fitting machine learning, neural networks, or AI models to a set of training data. These are now standard tools in science. A key challenge with the current generation of models is that they are highly parameterized, which makes describing and interpreting the prediction strategies difficult. We use topological data analysis to tr… ▽ More Complex prediction models such as deep learning are the output from fitting machine learning, neural networks, or AI models to a set of training data. These are now standard tools in science. A key challenge with the current generation of models is that they are highly parameterized, which makes describing and interpreting the prediction strategies difficult. We use topological data analysis to transform these complex prediction models into pictures representing a topological view. The result is a map of the predictions that enables inspection. The methods scale up to large datasets across different domains and enable us to detect labeling errors in training data, understand generalization in image classification, and inspect predictions of likely pathogenic mutations in the BRCA1 gene. △ Less

Submitted 19 October, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

arXiv:2206.00606 [pdf, other]

Topological Deep Learning: Going Beyond Graph Data

Authors: Mustafa Hajij, Ghada Zamzmi, Theodore Papamarkou, Nina Miolane, Aldo Guzmán-Sáenz, Karthikeyan Natesan Ramamurthy, Tolga Birdal, Tamal K. Dey, Soham Mukherjee, Shreyas N. Samaga, Neal Livesay, Robin Walters, Paul Rosen, Michael T. Schaub

Abstract: Topological deep learning is a rapidly growing field that pertains to the development of deep learning models for data supported on topological domains such as simplicial complexes, cell complexes, and hypergraphs, which generalize many domains encountered in scientific computations. In this paper, we present a unifying deep learning framework built upon a richer data structure that includes widel… ▽ More Topological deep learning is a rapidly growing field that pertains to the development of deep learning models for data supported on topological domains such as simplicial complexes, cell complexes, and hypergraphs, which generalize many domains encountered in scientific computations. In this paper, we present a unifying deep learning framework built upon a richer data structure that includes widely adopted topological domains. Specifically, we first introduce combinatorial complexes, a novel type of topological domain. Combinatorial complexes can be seen as generalizations of graphs that maintain certain desirable properties. Similar to hypergraphs, combinatorial complexes impose no constraints on the set of relations. In addition, combinatorial complexes permit the construction of hierarchical higher-order relations, analogous to those found in simplicial and cell complexes. Thus, combinatorial complexes generalize and combine useful traits of both hypergraphs and cell complexes, which have emerged as two promising abstractions that facilitate the generalization of graph neural networks to topological spaces. Second, building upon combinatorial complexes and their rich combinatorial and algebraic structure, we develop a general class of message-passing combinatorial complex neural networks (CCNNs), focusing primarily on attention-based CCNNs. We characterize permutation and orientation equivariances of CCNNs, and discuss pooling and unpooling operations within CCNNs in detail. Third, we evaluate the performance of CCNNs on tasks related to mesh shape analysis and graph learning. Our experiments demonstrate that CCNNs have competitive performance as compared to state-of-the-art deep learning models specifically tailored to the same tasks. Our findings demonstrate the advantages of incorporating higher-order relations into deep learning models in different applications. △ Less

Submitted 19 May, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

arXiv:2204.11080 [pdf, other]

Fast Computation of Zigzag Persistence

Authors: Tamal K. Dey, Tao Hou

Abstract: Zigzag persistence is a powerful extension of the standard persistence which allows deletions of simplices besides insertions. However, computing zigzag persistence usually takes considerably more time than the standard persistence. We propose an algorithm called FastZigzag which narrows this efficiency gap. Our main result is that an input simplex-wise zigzag filtration can be converted to a cell… ▽ More Zigzag persistence is a powerful extension of the standard persistence which allows deletions of simplices besides insertions. However, computing zigzag persistence usually takes considerably more time than the standard persistence. We propose an algorithm called FastZigzag which narrows this efficiency gap. Our main result is that an input simplex-wise zigzag filtration can be converted to a cell-wise non-zigzag filtration of a $Δ$-complex with the same length, where the cells are copies of the input simplices. This conversion step in FastZigzag incurs very little cost. Furthermore, the barcode of the original filtration can be easily read from the barcode of the new cell-wise filtration because the conversion embodies a series of diamond switches known in topological data analysis. This seemingly simple observation opens up the vast possibilities for improving the computation of zigzag persistence because any efficient algorithm/software for standard persistence can now be applied to computing zigzag persistence. Our experiment shows that this indeed achieves substantial performance gain over the existing state-of-the-art softwares. △ Less

Submitted 4 July, 2022; v1 submitted 23 April, 2022; originally announced April 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2110.06315

arXiv:2203.05727 [pdf, other]

Tracking Dynamical Features via Continuation and Persistence

Authors: Tamal K. Dey, Michał Lipiński, Marian Mrozek, Ryan Slechta

Abstract: Multivector fields and combinatorial dynamical systems have recently become a subject of interest due to their potential for use in computational methods. In this paper, we develop a method to track an isolated invariant set -- a salient feature of a combinatorial dynamical system -- across a sequence of multivector fields. This goal is attained by placing the classical notion of the "continuation… ▽ More Multivector fields and combinatorial dynamical systems have recently become a subject of interest due to their potential for use in computational methods. In this paper, we develop a method to track an isolated invariant set -- a salient feature of a combinatorial dynamical system -- across a sequence of multivector fields. This goal is attained by placing the classical notion of the "continuation" of an isolated invariant set in the combinatorial setting. In particular, we give a "Tracking Protocol" that, when given a seed isolated invariant set, finds a canonical continuation of the seed across a sequence of multivector fields. In cases where it is not possible to continue, we show how to use zigzag persistence to track homological features associated with the isolated invariant sets. This construction permits viewing continuation as a special case of persistence. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: Full version of SoCG 2022 paper

arXiv:2112.02352 [pdf, other]

Updating Barcodes and Representatives for Zigzag Persistence

Authors: Tamal K. Dey, Tao Hou

Abstract: Computing persistence over changing filtrations give rise to a stack of 2D persistence diagrams where the birth-death points are connected by the so-called `vines'. We consider computing these vines over changing filtrations for zigzag persistence. We observe that eight atomic operations are sufficient for changing one zigzag filtration to another and provide update algorithms for each of them. Si… ▽ More Computing persistence over changing filtrations give rise to a stack of 2D persistence diagrams where the birth-death points are connected by the so-called `vines'. We consider computing these vines over changing filtrations for zigzag persistence. We observe that eight atomic operations are sufficient for changing one zigzag filtration to another and provide update algorithms for each of them. Six of these operations that have some analogues to one or multiple transpositions in the non-zigzag case can be executed as efficiently as their non-zigzag counterparts. This approach takes advantage of a recently discovered algorithm for computing zigzag barcodes by converting a zigzag filtration to a non-zigzag one and then connecting barcodes of the two with a bijection. The remaining two atomic operations do not have a strict analogue in the non-zigzag case. For them, we propose algorithms based on explicit maintenance of representatives (homology cycles) which can be useful in their own rights for applications requiring explicit updates of representatives. △ Less

Submitted 1 August, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

arXiv:2111.15058 [pdf, other]

Computing Generalized Rank Invariant for 2-Parameter Persistence Modules via Zigzag Persistence and its Applications

Authors: Tamal K. Dey, Woojin Kim, Facundo Mémoli

Abstract: The notion of generalized rank invariant in the context of multiparameter persistence has become an important ingredient for defining interesting homological structures such as generalized persistence diagrams. Naturally, computing these rank invariants efficiently is a prelude to computing any of these derived structures efficiently. We show that the generalized rank over a finite interval $I$ of… ▽ More The notion of generalized rank invariant in the context of multiparameter persistence has become an important ingredient for defining interesting homological structures such as generalized persistence diagrams. Naturally, computing these rank invariants efficiently is a prelude to computing any of these derived structures efficiently. We show that the generalized rank over a finite interval $I$ of a $\mathbb{Z}^2$-indexed persistence module $M$ is equal to the generalized rank of the zigzag module that is induced on a certain path in $I$ tracing mostly its boundary. Hence, we can compute the generalized rank over $I$ by computing the barcode of the zigzag module obtained by restricting the bifiltration inducing $M$ to that path. If the bifiltration and $I$ have at most $t$ simplices and points respectively, this computation takes $O(t^ω)$ time where $ω\in[2,2.373)$ is the exponent of matrix multiplication. Among others, we apply this result to obtain an improved algorithm for the following problem. Given a bifiltration inducing a module $M$, determine whether $M$ is interval decomposable and, if so, compute all intervals supporting its summands. △ Less

Submitted 30 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

Comments: Full version of the paper in the Proceedings of the 38th International Symposium on Computational Geometry (SoCG 2022). Shortened the proof of Theorem 3.12 and added new sections 4.4 and 4.5; 21 pages, 4 figures

arXiv:2110.14734 [pdf, other]

Approximating 1-Wasserstein Distance between Persistence Diagrams by Graph Sparsification

Authors: Tamal K. Dey, Simon Zhang

Abstract: Persistence diagrams (PD)s play a central role in topological data analysis. This analysis requires computing distances among such diagrams such as the $1$-Wasserstein distance. Accurate computation of these PD distances for large data sets that render large diagrams may not scale appropriately with the existing methods. The main source of difficulty ensues from the size of the bipartite graph on… ▽ More Persistence diagrams (PD)s play a central role in topological data analysis. This analysis requires computing distances among such diagrams such as the $1$-Wasserstein distance. Accurate computation of these PD distances for large data sets that render large diagrams may not scale appropriately with the existing methods. The main source of difficulty ensues from the size of the bipartite graph on which a matching needs to be computed for determining these PD distances. We address this problem by making several algorithmic and computational observations in order to obtain, in theory, a near-linear fully polynomial-time approximation scheme. This is theoretically optimal assuming the $(1+ε)$-approximate EMD conjecture in constant dimension, which is that the EMD problem on the plane cannot be approximated by a PTAS in time $O(\frac{1}{ε^2}n)$ up to polylog factors. In our implementation, first, taking advantage of the distribution of PD points, we \emph{condense} them thereby decreasing the number of nodes in the graph for computation. The increase in point multiplicities is addressed by reducing the matching problem to a min-cost flow problem on a transshipment network. Second, we use Well Separated Pair Decomposition to sparsify the graph to a size that is linear in the number of points. Both node and arc sparsifications contribute to the approximation factor where we leverage a lower bound given by the Relaxed Word Mover's distance. Third, we eliminate bottlenecks during the sparsification procedure by introducing parallelism. Fourth, we develop an open source software called PDoptFlow based on our algorithm, exploiting parallelism by GPU and multicore. We perform extensive experiments and show that the actual empirical error is very low. We also show that we can achieve high performance at low guaranteed relative errors, improving upon the state of the arts. △ Less

Submitted 9 May, 2025; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: 39 pages, 12 figures; extended version of paper published in ALENEX 2022

arXiv:2110.06315 [pdf, other]

On Association between Absolute and Relative Zigzag Persistence

Authors: Tamal K. Dey, Tao Hou

Abstract: Duality results connecting persistence modules for absolute and relative homology provides a fundamental understanding into persistence theory. In this paper, we study similar associations in the context of zigzag persistence. Our main finding is a weak duality for the so-called non-repetitive zigzag filtrations in which a simplex is never added again after being deleted. The technique used to pro… ▽ More Duality results connecting persistence modules for absolute and relative homology provides a fundamental understanding into persistence theory. In this paper, we study similar associations in the context of zigzag persistence. Our main finding is a weak duality for the so-called non-repetitive zigzag filtrations in which a simplex is never added again after being deleted. The technique used to prove the duality for non-zigzag persistence does not extend straightforwardly to our case. Accordingly, taking a different route, we prove the weak duality by converting a non-repetitive filtration to an up-down filtration by a sequence of diamond switches. We then show an application of the weak duality result which gives a near-linear algorithm for computing the $p$-th and a subset of the $(p-1)$-th persistence for a non-repetitive zigzag filtration of a simplicial $p$-manifold. Utilizing the fact that a non-repetitive filtration admits an up-down filtration as its canonical form, we further reduce the problem of computing zigzag persistence for non-repetitive filtrations to the problem of computing standard persistence for which several efficient implementations exist. Our experiment shows that this achieves substantial performance gain. Our study also identifies repetitive filtrations as instances that fundamentally distinguish zigzag persistence from the standard persistence. △ Less

Submitted 12 October, 2021; originally announced October 2021.

arXiv:2108.07429 [pdf, other]

Rectangular Approximation and Stability of $2$-parameter Persistence Modules

Authors: Tamal K. Dey, Cheng Xin

Abstract: One of the main reasons for topological persistence being useful in data analysis is that it is backed up by a stability (isometry) property: persistence diagrams of $1$-parameter persistence modules are stable in the sense that the bottleneck distance between two diagrams equals the interleaving distance between their generating modules. However, in multi-parameter setting this property breaks do… ▽ More One of the main reasons for topological persistence being useful in data analysis is that it is backed up by a stability (isometry) property: persistence diagrams of $1$-parameter persistence modules are stable in the sense that the bottleneck distance between two diagrams equals the interleaving distance between their generating modules. However, in multi-parameter setting this property breaks down in general. A simple special case of persistence modules called rectangle decomposable modules is known to admit a weaker stability property. Using this fact, we derive a stability-like property for $2$-parameter persistence modules. For this, first we consider interval decomposable modules and their optimal approximations with rectangle decomposable modules with respect to the bottleneck distance. We provide a polynomial time algorithm to exactly compute this optimal approximation which, together with the polynomial-time computable bottleneck distance among interval decomposable modules, provides a lower bound on the interleaving distance. Next, we leverage this result to derive a polynomial-time computable distance for general multi-parameter persistence modules which enjoys similar stability-like property. This distance can be viewed as a generalization of the matching distance defined in the literature. △ Less

Submitted 17 August, 2021; originally announced August 2021.

arXiv:2107.02115 [pdf, other]

Persistence of Conley-Morse Graphs in Combinatorial Dynamical Systems

Authors: Tamal K. Dey, Marian Mrozek, Ryan Slechta

Abstract: Multivector fields provide an avenue for studying continuous dynamical systems in a combinatorial framework. There are currently two approaches in the literature which use persistent homology to capture changes in combinatorial dynamical systems. The first captures changes in the Conley index, while the second captures changes in the Morse decomposition. However, such approaches have limitations.… ▽ More Multivector fields provide an avenue for studying continuous dynamical systems in a combinatorial framework. There are currently two approaches in the literature which use persistent homology to capture changes in combinatorial dynamical systems. The first captures changes in the Conley index, while the second captures changes in the Morse decomposition. However, such approaches have limitations. The former approach only describes how the Conley index changes across a selected isolated invariant set though the dynamics can be much more complicated than the behavior of a single isolated invariant set. Likewise, considering a Morse decomposition omits much information about the individual Morse sets. In this paper, we propose a method to summarize changes in combinatorial dynamical systems by capturing changes in the so-called Conley-Morse graphs. A Conley-Morse graph contains information about both the structure of a selected Morse decomposition and about the Conley index at each Morse set in the decomposition. Hence, our method summarizes the changing structure of a sequence of dynamical systems at a finer granularity than previous approaches. △ Less

Submitted 5 July, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

arXiv:2105.00518 [pdf, other]

Computing Optimal Persistent Cycles for Levelset Zigzag on Manifold-like Complexes

Authors: Tamal K. Dey, Tao Hou, Anirudh Pulavarthy

Abstract: In standard persistent homology, a persistent cycle born and dying with a persistence interval (bar) associates the bar with a concrete topological representative, which provides means to effectively navigate back from the barcode to the topological space. Among the possibly many, optimal persistent cycles bring forth further information due to having guaranteed quality. However, topological featu… ▽ More In standard persistent homology, a persistent cycle born and dying with a persistence interval (bar) associates the bar with a concrete topological representative, which provides means to effectively navigate back from the barcode to the topological space. Among the possibly many, optimal persistent cycles bring forth further information due to having guaranteed quality. However, topological features usually go through variations in the lifecycle of a bar which a single persistent cycle may not capture. Hence, for persistent homology induced from PL functions, we propose levelset persistent cycles consisting of a sequence of cycles that depict the evolution of homological features from birth to death. Our definition is based on levelset zigzag persistence which involves four types of persistence intervals as opposed to the two types in standard persistence. For each of the four types, we present a polynomial-time algorithm computing an optimal sequence of levelset persistent $p$-cycles for the so-called weak $(p+1)$-pseudomanifolds. Given that optimal cycle problems for homology are NP-hard in general, our results are useful in practice because weak pseudomanifolds do appear in applications. Our algorithms draw upon an idea of relating optimal cycles to min-cuts in a graph that was exploited earlier for standard persistent cycles. Notice that levelset zigzag poses non-trivial challenges for the approach because a sequence of optimal cycles instead of a single one needs to be computed in this case. We show some empirical evidence that optimal cycles produced by our implemented software have nice quality. △ Less

Submitted 27 February, 2025; v1 submitted 2 May, 2021; originally announced May 2021.

arXiv:2104.13430 [pdf, other]

Topological Filtering for 3D Microstructure Segmentation

Authors: Anand V. Patel, Tao Hou, Juan D. Beltran Rodriguez, Tamal K. Dey, Dunbar P. Birnie III

Abstract: Tomography is a widely used tool for analyzing microstructures in three dimensions (3D). The analysis, however, faces difficulty because the constituent materials produce similar grey-scale values. Sometimes, this prompts the image segmentation process to assign a pixel/voxel to the wrong phase (active material or pore). Consequently, errors are introduced in the microstructure characteristics cal… ▽ More Tomography is a widely used tool for analyzing microstructures in three dimensions (3D). The analysis, however, faces difficulty because the constituent materials produce similar grey-scale values. Sometimes, this prompts the image segmentation process to assign a pixel/voxel to the wrong phase (active material or pore). Consequently, errors are introduced in the microstructure characteristics calculation. In this work, we develop a filtering algorithm called PerSplat based on topological persistence (a technique used in topological data analysis) to improve segmentation quality. One problem faced when evaluating filtering algorithms is that real image data in general are not equipped with the `ground truth' for the microstructure characteristics. For this study, we construct synthetic images for which the ground-truth values are known. On the synthetic images, we compare the pore tortuosity and Minkowski functionals (volume and surface area) computed with our PerSplat filter and other methods such as total variation (TV) and non-local means (NL-means). Moreover, on a real 3D image, we visually compare the segmentation results provided by our filter against TV and NL-means. The experimental results indicate that PerSplat provides a significant improvement in segmentation quality. △ Less

Submitted 26 September, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

arXiv:2103.09583 [pdf, other]

2D Points Curve Reconstruction Survey and Benchmark

Authors: Stefan Ohrhallinger, Jiju Peethambaran, Amal D. Parakkat, Tamal K. Dey, Ramanathan Muthuganapathy

Abstract: Curve reconstruction from unstructured points in a plane is a fundamental problem with many applications that has generated research interest for decades. Involved aspects like handling open, sharp, multiple and non-manifold outlines, run-time and provability as well as potential extension to 3D for surface reconstruction have led to many different algorithms. We survey the literature on 2D curve… ▽ More Curve reconstruction from unstructured points in a plane is a fundamental problem with many applications that has generated research interest for decades. Involved aspects like handling open, sharp, multiple and non-manifold outlines, run-time and provability as well as potential extension to 3D for surface reconstruction have led to many different algorithms. We survey the literature on 2D curve reconstruction and then present an open-sourced benchmark for the experimental study. Our unprecedented evaluation on a selected set of planar curve reconstruction algorithms aims to give an overview of both quantitative analysis and qualitative aspects for helping users to select the right algorithm for specific problems in the field. Our benchmark framework is available online to permit reproducing the results, and easy integration of new algorithms. △ Less

Submitted 17 March, 2021; originally announced March 2021.

Comments: 24 pages, 22 figures, 5 tables

arXiv:2103.07353 [pdf, ps, other]

Computing Zigzag Persistence on Graphs in Near-Linear Time

Authors: Tamal K. Dey, Tao Hou

Abstract: Graphs model real-world circumstances in many applications where they may constantly change to capture the dynamic behavior of the phenomena. Topological persistence which provides a set of birth and death pairs for the topological features is one instrument for analyzing such changing graph data. However, standard persistent homology defined over a growing space cannot always capture such a dynam… ▽ More Graphs model real-world circumstances in many applications where they may constantly change to capture the dynamic behavior of the phenomena. Topological persistence which provides a set of birth and death pairs for the topological features is one instrument for analyzing such changing graph data. However, standard persistent homology defined over a growing space cannot always capture such a dynamic process unless shrinking with deletions is also allowed. Hence, zigzag persistence which incorporates both insertions and deletions of simplices is more appropriate in such a setting. Unlike standard persistence which admits nearly linear-time algorithms for graphs, such results for the zigzag version improving the general $O(m^ω)$ time complexity are not known, where $ω< 2.37286$ is the matrix multiplication exponent. In this paper, we propose algorithms for zigzag persistence on graphs which run in near-linear time. Specifically, given a filtration with $m$ additions and deletions on a graph with $n$ vertices and edges, the algorithm for $0$-dimension runs in $O(m\log^2 n+m\log m)$ time and the algorithm for 1-dimension runs in $O(m\log^4 n)$ time. The algorithm for $0$-dimension draws upon another algorithm designed originally for pairing critical points of Morse functions on $2$-manifolds. The algorithm for $1$-dimension pairs a negative edge with the earliest positive edge so that a $1$-cycle containing both edges resides in all intermediate graphs. Both algorithms achieve the claimed time complexity via dynamic graph data structures proposed by Holm et al. In the end, using Alexander duality, we extend the algorithm for $0$-dimension to compute the $(p-1)$-dimensional zigzag persistence for $\mathbb{R}^p$-embedded complexes in $O(m\log^2 n+m\log m+n\log n)$ time. △ Less

Submitted 12 March, 2021; originally announced March 2021.

Comments: The full version of the paper

arXiv:2003.05579 [pdf, other]

Persistence of the Conley Index in Combinatorial Dynamical Systems

Authors: Tamal K. Dey, Marian Mrozek, Ryan Slechta

Abstract: A combinatorial framework for dynamical systems provides an avenue for connecting classical dynamics with data-oriented, algorithmic methods. Combinatorial vector fields introduced by Forman and their recent generalization to multivector fields have provided a starting point for building such a connection. In this work, we strengthen this relationship by placing the Conley index in the persistent… ▽ More A combinatorial framework for dynamical systems provides an avenue for connecting classical dynamics with data-oriented, algorithmic methods. Combinatorial vector fields introduced by Forman and their recent generalization to multivector fields have provided a starting point for building such a connection. In this work, we strengthen this relationship by placing the Conley index in the persistent homology setting. Conley indices are homological features associated with so-called isolated invariant sets, so a change in the Conley index is a response to perturbation in an underlying multivector field. We show how one can use zigzag persistence to summarize changes to the Conley index, and we develop techniques to capture such changes in the presence of noise. We conclude by developing an algorithm to track features in a changing multivector field. △ Less

Submitted 11 March, 2020; originally announced March 2020.

arXiv:2001.09549 [pdf, other]

An efficient algorithm for $1$-dimensional (persistent) path homology

Authors: Tamal K. Dey, Tianqi Li, Yusu Wang

Abstract: This paper focuses on developing an efficient algorithm for analyzing a directed network (graph) from a topological viewpoint. A prevalent technique for such topological analysis involves computation of homology groups and their persistence. These concepts are well suited for spaces that are not directed. As a result, one needs a concept of homology that accommodates orientations in input space. P… ▽ More This paper focuses on developing an efficient algorithm for analyzing a directed network (graph) from a topological viewpoint. A prevalent technique for such topological analysis involves computation of homology groups and their persistence. These concepts are well suited for spaces that are not directed. As a result, one needs a concept of homology that accommodates orientations in input space. Path-homology developed for directed graphs by Grigor'yan, Lin, Muranov and Yau has been effectively adapted for this purpose recently by Chowdhury and Mémoli. They also give an algorithm to compute this path-homology. Our main contribution in this paper is an algorithm that computes this path-homology and its persistence more efficiently for the $1$-dimensional ($H_1$) case. In developing such an algorithm, we discover various structures and their efficient computations that aid computing the $1$-dimensional path-homnology. We implement our algorithm and present some preliminary experimental results. △ Less

Submitted 26 January, 2020; originally announced January 2020.

arXiv:1909.06728 [pdf, other]

doi 10.1145/3347146.3359348

Road Network Reconstruction from Satellite Images with Machine Learning Supported by Topological Methods

Authors: Tamal K. Dey, Jiayuan Wang, Yusu Wang

Abstract: Automatic Extraction of road network from satellite images is a goal that can benefit and even enable new technologies. Methods that combine machine learning (ML) and computer vision have been proposed in recent years which make the task semi-automatic by requiring the user to provide curated training samples. The process can be fully automatized if training samples can be produced algorithmically… ▽ More Automatic Extraction of road network from satellite images is a goal that can benefit and even enable new technologies. Methods that combine machine learning (ML) and computer vision have been proposed in recent years which make the task semi-automatic by requiring the user to provide curated training samples. The process can be fully automatized if training samples can be produced algorithmically. Of course, this requires a robust algorithm that can reconstruct the road networks from satellite images reliably so that the output can be fed as training samples. In this work, we develop such a technique by infusing a persistence-guided discrete Morse based graph reconstruction algorithm into ML framework. We elucidate our contributions in two phases. First, in a semi-automatic framework, we combine a discrete-Morse based graph reconstruction algorithm with an existing CNN framework to segment input satellite images. We show that this leads to reconstructions with better connectivity and less noise. Next, in a fully automatic framework, we leverage the power of the discrete-Morse based graph reconstruction algorithm to train a CNN from a collection of images without labelled data and use the same algorithm to produce the final output from the segmented images created by the trained CNN. We apply the discrete-Morse based graph reconstruction algorithm iteratively to improve the accuracy of the CNN. We show promising experimental results of this new framework on datasets from SpaceNet Challenge. △ Less

Submitted 15 September, 2019; originally announced September 2019.

Comments: 26 pages, 13 figures, ACM SIGSPATIAL 2019

arXiv:1907.04889 [pdf, other]

Computing Minimal Persistent Cycles: Polynomial and Hard Cases

Authors: Tamal K. Dey, Tao Hou, Sayan Mandal

Abstract: Persistent cycles, especially the minimal ones, are useful geometric features functioning as augmentations for the intervals in a purely topological persistence diagram (also termed as barcode). In our earlier work, we showed that computing minimal 1-dimensional persistent cycles (persistent 1-cycles) for finite intervals is NP-hard while the same for infinite intervals is polynomially tractable.… ▽ More Persistent cycles, especially the minimal ones, are useful geometric features functioning as augmentations for the intervals in a purely topological persistence diagram (also termed as barcode). In our earlier work, we showed that computing minimal 1-dimensional persistent cycles (persistent 1-cycles) for finite intervals is NP-hard while the same for infinite intervals is polynomially tractable. In this paper, we address this problem for general dimensions with $\mathbb{Z}_2$ coefficients. In addition to proving that it is NP-hard to compute minimal persistent d-cycles (d>1) for both types of intervals given arbitrary simplicial complexes, we identify two interesting cases which are polynomially tractable. These two cases assume the complex to be a certain generalization of manifolds which we term as weak pseudomanifolds. For finite intervals from the d-th persistence diagram of a weak (d+1)-pseudomanifold, we utilize the fact that persistent cycles of such intervals are null-homologous and reduce the problem to a minimal cut problem. Since the same problem for infinite intervals is NP-hard, we further assume the weak (d+1)-pseudomanifold to be embedded in $\mathbb{R}^{d+1}$ so that the complex has a natural dual graph structure and the problem reduces to a minimal cut problem. Experiments with both algorithms on scientific data indicate that the minimal persistent cycles capture various significant features of the data. △ Less

Submitted 14 February, 2020; v1 submitted 10 July, 2019; originally announced July 2019.

Comments: Content same as appeared in the proceeding of SODA20'

arXiv:1904.03766 [pdf, other]

Generalized Persistence Algorithm for Decomposing Multi-parameter Persistence Modules

Authors: Tamal K. Dey, Cheng Xin

Abstract: The classical persistence algorithm computes the unique decomposition of a persistence module implicitly given by an input simplicial filtration. Based on matrix reduction, this algorithm is a cornerstone of the emergent area of topological data analysis. Its input is a simplicial filtration defined over the integers $\mathbb{Z}$ giving rise to a $1$-parameter persistence module. It has been recog… ▽ More The classical persistence algorithm computes the unique decomposition of a persistence module implicitly given by an input simplicial filtration. Based on matrix reduction, this algorithm is a cornerstone of the emergent area of topological data analysis. Its input is a simplicial filtration defined over the integers $\mathbb{Z}$ giving rise to a $1$-parameter persistence module. It has been recognized that multiparameter version of persistence modules given by simplicial filtrations over $d$-dimensional integer grids $\mathbb{Z}^d$ is equally or perhaps more important in data science applications. However, in the multiparameter setting, one of the main challenges is that topological summaries based on algebraic structure such as decompositions and bottleneck distances cannot be as efficiently computed as in the $1$-parameter case because there is no known extension of the persistence algorithm to multiparameter persistence modules. We present an efficient algorithm to compute the unique decomposition of a finitely presented persistence module $M$ defined over the multiparameter $\mathbb{Z}^d$. The algorithm first assumes that the module is presented with a set of $N$ generators and relations that are \emph{distinctly graded}. Based on a generalized matrix reduction technique it runs in $O(N^{2ω+1})$ time where $ω<2.373$ is the exponent for matrix multiplication. This is much better than the well known algorithm called Meataxe which runs in $\tilde{O}(N^{6(d+1)})$ time on such an input. In practice, persistence modules are usually induced by simplicial filtrations. With such an input consisting of $n$ simplices, our algorithm runs in $O(n^{(d-1)(2ω+ 1)})$ time for $d\geq 2$. For the special case of zero dimensional homology, it runs in time $O(n^{2ω+1})$. △ Less

Submitted 6 December, 2021; v1 submitted 7 April, 2019; originally announced April 2019.

arXiv:1810.04807 [pdf, other]

Persistent 1-Cycles: Definition, Computation, and Its Application

Authors: Tamal K. Dey, Tao Hou, Sayan Mandal

Abstract: Persistence diagrams, which summarize the birth and death of homological features extracted from data, are employed as stable signatures for applications in image analysis and other areas. Besides simply considering the multiset of intervals included in a persistence diagram, some applications need to find representative cycles for the intervals. In this paper, we address the problem of computing… ▽ More Persistence diagrams, which summarize the birth and death of homological features extracted from data, are employed as stable signatures for applications in image analysis and other areas. Besides simply considering the multiset of intervals included in a persistence diagram, some applications need to find representative cycles for the intervals. In this paper, we address the problem of computing these representative cycles, termed as persistent 1-cycles, for $\text{H}_1$-persistent homology with $\mathbb{Z}_2$ coefficients. The definition of persistent cycles is based on the interval module decomposition of persistence modules, which reveals the structure of persistent homology. After showing that the computation of the optimal persistent 1-cycles is NP-hard, we propose an alternative set of meaningful persistent 1-cycles that can be computed with an efficient polynomial time algorithm. We also inspect the stability issues of the optimal persistent 1-cycles and the persistent 1-cycles computed by our algorithm with the observation that the perturbations of both cannot be properly bounded. We design a software which applies our algorithm to various datasets. Experiments on 3D point clouds, mineral structures, and images show the effectiveness of our algorithm in practice. △ Less

Submitted 15 October, 2018; v1 submitted 10 October, 2018; originally announced October 2018.

Comments: Correct the algorithm numbering issue

arXiv:1810.04388 [pdf, ps, other]

Filtration Simplification for Persistent Homology via Edge Contraction

Authors: Tamal K. Dey, Ryan Slechta

Abstract: Persistent homology is a popular data analysis technique that is used to capture the changing topology of a filtration associated with some simplicial complex $K$. These topological changes are summarized in persistence diagrams. We propose two contraction operators which when applied to $K$ and its associated filtration, bound the perturbation in the persistence diagrams. The first assumes that t… ▽ More Persistent homology is a popular data analysis technique that is used to capture the changing topology of a filtration associated with some simplicial complex $K$. These topological changes are summarized in persistence diagrams. We propose two contraction operators which when applied to $K$ and its associated filtration, bound the perturbation in the persistence diagrams. The first assumes that the underlying space of $K$ is a $2$-manifold and ensures that simplices are paired with the same simplices in the contracted complex as they are in the original. The second is for arbitrary $d$-complexes, and bounds the bottleneck distance between the initial and contracted $p$-dimensional persistence diagrams. This is accomplished by defining interleaving maps between persistence modules which arise from chain maps defined over the filtrations. In addition, we show how the second operator can efficiently compose across multiple contractions. We conclude with experiments demonstrating the second operator's utility on manifolds. △ Less

Submitted 10 October, 2018; originally announced October 2018.

Comments: 15 pages including proofs and references, 5 figures, 2 tables. Full version

arXiv:1807.03655 [pdf, other]

Computing Height Persistence and Homology Generators in $\mathbb{R}^3$ Efficiently

Authors: Tamal K. Dey

Abstract: Recently it has been shown that computing the dimension of the first homology group $H_1(K)$ of a simplicial $2$-complex $K$ embedded linearly in $\mathbb{R}^4$ is as hard as computing the rank of a sparse $0-1$ matrix. This puts a major roadblock to computing persistence and a homology basis (generators) for complexes embedded in $\mathbb{R}^4$ and beyond in less than quadratic or even near-quadr… ▽ More Recently it has been shown that computing the dimension of the first homology group $H_1(K)$ of a simplicial $2$-complex $K$ embedded linearly in $\mathbb{R}^4$ is as hard as computing the rank of a sparse $0-1$ matrix. This puts a major roadblock to computing persistence and a homology basis (generators) for complexes embedded in $\mathbb{R}^4$ and beyond in less than quadratic or even near-quadratic time. But, what about dimension three? It is known that persistence for piecewise linear functions on a complex $K$ with $n$ simplices can be computed in $O(n\log n)$ time and a set of generators of total size $k$ can be computed in $O(n+k)$ time when $K$ is a graph or a surface linearly embedded in $\mathbb{R}^3$. But, the question for general simplicial complexes $K$ linearly embedded in $\mathbb{R}^3$ is not completely settled. No algorithm with a complexity better than that of the matrix multiplication is known for this important case. We show that the persistence for {\em height functions} on such complexes, hence called {\em height persistence}, can be computed in $O(n\log n)$ time. This allows us to compute a basis (generators) of $H_i(K)$, $i=1,2$, in $O(n\log n+k)$ time where $k$ is the size of the output. This improves significantly the current best bound of $O(n^ω)$, $ω$ being the matrix multiplication exponent. We achieve these improved bounds by leveraging recent results on zigzag persistence in computational topology, new observations about Reeb graphs, and some efficient geometric data structures. △ Less

Submitted 14 March, 2019; v1 submitted 10 July, 2018; originally announced July 2018.

Journal ref: SODA 2019

arXiv:1803.05093 [pdf, other]

doi 10.4230/LIPIcs.SoCG.2018.31

Graph Reconstruction by Discrete Morse Theory

Authors: Tamal K. Dey, Jiayuan Wang, Yusu Wang

Abstract: Recovering hidden graph-like structures from potentially noisy data is a fundamental task in modern data analysis. Recently, a persistence-guided discrete Morse-based framework to extract a geometric graph from low-dimensional data has become popular. However, to date, there is very limited theoretical understanding of this framework in terms of graph reconstruction. This paper makes a first step… ▽ More Recovering hidden graph-like structures from potentially noisy data is a fundamental task in modern data analysis. Recently, a persistence-guided discrete Morse-based framework to extract a geometric graph from low-dimensional data has become popular. However, to date, there is very limited theoretical understanding of this framework in terms of graph reconstruction. This paper makes a first step towards closing this gap. Specifically, first, leveraging existing theoretical understanding of persistence-guided discrete Morse cancellation, we provide a simplified version of the existing discrete Morse-based graph reconstruction algorithm. We then introduce a simple and natural noise model and show that the aforementioned framework can correctly reconstruct a graph under this noise model, in the sense that it has the same loop structure as the hidden ground-truth graph, and is also geometrically close. We also provide some experimental results for our simplified graph-reconstruction algorithm. △ Less

Submitted 20 March, 2018; v1 submitted 13 March, 2018; originally announced March 2018.

Comments: 25 pages, 22 figures

arXiv:1803.02869 [pdf, other]

Computing Bottleneck Distance for Multi-parameter Interval Decomposable Persistence Modules

Authors: Tamal K. Dey, Cheng Xin

Abstract: Computation of the interleaving distance between persistence modules is a central task in topological data analysis. For $1$-parameter persistence modules, thanks to the isometry theorem, this can be done by computing the bottleneck distance with known efficient algorithms. The question is open for most $n$-parameter persistence modules, $n>1$, because of the well recognized complications of the i… ▽ More Computation of the interleaving distance between persistence modules is a central task in topological data analysis. For $1$-parameter persistence modules, thanks to the isometry theorem, this can be done by computing the bottleneck distance with known efficient algorithms. The question is open for most $n$-parameter persistence modules, $n>1$, because of the well recognized complications of the indecomposables. Here, we consider a reasonably complicated class called {\em $n$-parameter interval decomposable} modules whose indecomposables may have a description of non-constant complexity. We present a polynomial time algorithm to compute the bottleneck distance for these modules from indecomposables, which bounds the interleaving distance from above, and give another algorithm to compute a new distance called {\em dimension distance} that bounds it from below. An earlier version of this paper considered only the $2$-parameter interval decomposable modules~\cite{DeyCheng18}. △ Less

Submitted 3 October, 2019; v1 submitted 7 March, 2018; originally announced March 2018.

Comments: This is the n-parameter extension of the conference paper that appeared in SoCG 2018 (which was only for 2-parameter case)

arXiv:1801.06759 [pdf, ps, other]

Efficient algorithms for computing a minimal homology basis

Authors: Tamal K. Dey, Tianqi Li, Yusu Wang

Abstract: Efficient computation of shortest cycles which form a homology basis under $\mathbb{Z}_2$-additions in a given simplicial complex $\mathcal{K}$ has been researched actively in recent years. When the complex $\mathcal{K}$ is a weighted graph with $n$ vertices and $m$ edges, the problem of computing a shortest (homology) cycle basis is known to be solvable in $O(m^2n/\log n+ n^2m)$-time. Several wor… ▽ More Efficient computation of shortest cycles which form a homology basis under $\mathbb{Z}_2$-additions in a given simplicial complex $\mathcal{K}$ has been researched actively in recent years. When the complex $\mathcal{K}$ is a weighted graph with $n$ vertices and $m$ edges, the problem of computing a shortest (homology) cycle basis is known to be solvable in $O(m^2n/\log n+ n^2m)$-time. Several works \cite{borradaile2017minimum, greedy} have addressed the case when the complex $\mathcal{K}$ is a $2$-manifold. The complexity of these algorithms depends on the rank $g$ of the one-dimensional homology group of $\mathcal{K}$. This rank $g$ has a lower bound of $Θ(n)$, where $n$ denotes the number of simplices in $\mathcal{K}$, giving an $O(n^4)$ worst-case time complexity for the algorithms in \cite{borradaile2017minimum,greedy}. This worst-case complexity is improved in \cite{annotation} to $O(n^ω+ n^2g^{ω-1})$ for general simplicial complexes where $ω< 2.3728639$ \cite{le2014powers} is the matrix multiplication exponent. Taking $g=Θ(n)$, this provides an $O(n^{ω+1})$ worst-case algorithm. In this paper, we improve this time complexity. Combining the divide and conquer technique from \cite{DivideConquer} with the use of annotations from \cite{annotation}, we present an algorithm that runs in $O(n^ω+n^2g)$ time giving the first $O(n^3)$ worst-case algorithm for general complexes. If instead of minimal basis, we settle for an approximate basis, we can improve the running time even further. We show that a $2$-approximate minimal homology basis can be computed in $O(n^ω\sqrt{n \log n})$ expected time. We also study more general measures for defining the minimal basis and identify reasonable conditions on these measures that allow computing a minimal basis efficiently. △ Less

Submitted 20 January, 2018; originally announced January 2018.

Comments: 14 pages, to be presented on LATIN 2018

arXiv:1801.06590 [pdf, other]

Persistent Homology of Morse Decompositions in Combinatorial Dynamics

Authors: Tamal K. Dey, Mateusz Juda, Tomasz Kapela, Jacek Kubica, Michal Lipinski, Marian Mrozek

Abstract: We investigate combinatorial dynamical systems on simplicial complexes considered as {\em finite topological spaces}. Such systems arise in a natural way from sampling dynamics and may be used to reconstruct some features of the dynamics directly from the sample. We study the homological persistence of {\em Morse decompositions} of such systems, an important descriptor of the dynamics, as a tool f… ▽ More We investigate combinatorial dynamical systems on simplicial complexes considered as {\em finite topological spaces}. Such systems arise in a natural way from sampling dynamics and may be used to reconstruct some features of the dynamics directly from the sample. We study the homological persistence of {\em Morse decompositions} of such systems, an important descriptor of the dynamics, as a tool for validating the reconstruction. Our framework can be viewed as a step toward extending the classical persistence theory to "vector cloud" data. We present experimental results on two numerical examples. △ Less

Submitted 11 July, 2018; v1 submitted 19 January, 2018; originally announced January 2018.

arXiv:1707.09904 [pdf, other]

Temporal Hierarchical Clustering

Authors: Tamal K. Dey, Alfred Rossi, Anastasios Sidiropoulos

Abstract: We study hierarchical clusterings of metric spaces that change over time. This is a natural geometric primitive for the analysis of dynamic data sets. Specifically, we introduce and study the problem of finding a temporally coherent sequence of hierarchical clusterings from a sequence of unlabeled point sets. We encode the clustering objective by embedding each point set into an ultrametric space,… ▽ More We study hierarchical clusterings of metric spaces that change over time. This is a natural geometric primitive for the analysis of dynamic data sets. Specifically, we introduce and study the problem of finding a temporally coherent sequence of hierarchical clusterings from a sequence of unlabeled point sets. We encode the clustering objective by embedding each point set into an ultrametric space, which naturally induces a hierarchical clustering of the set of points. We enforce temporal coherence among the embeddings by finding correspondences between successive pairs of ultrametric spaces which exhibit small distortion in the Gromov-Hausdorff sense. We present both upper and lower bounds on the approximability of the resulting optimization problems. △ Less

Submitted 19 October, 2017; v1 submitted 31 July, 2017; originally announced July 2017.

Comments: 14 pages, 4 figures

ACM Class: F.2.2; I.5.3

arXiv:1704.05964 [pdf, other]

Temporal Clustering

Authors: Tamal K. Dey, Alfred Rossi, Anastasios Sidiropoulos

Abstract: We study the problem of clustering sequences of unlabeled point sets taken from a common metric space. Such scenarios arise naturally in applications where a system or process is observed in distinct time intervals, such as biological surveys and contagious disease surveillance. In this more general setting existing algorithms for classical (i.e.~static) clustering problems are not applicable anym… ▽ More We study the problem of clustering sequences of unlabeled point sets taken from a common metric space. Such scenarios arise naturally in applications where a system or process is observed in distinct time intervals, such as biological surveys and contagious disease surveillance. In this more general setting existing algorithms for classical (i.e.~static) clustering problems are not applicable anymore. We propose a set of optimization problems which we collectively refer to as 'temporal clustering'. The quality of a solution to a temporal clustering instance can be quantified using three parameters: the number of clusters $k$, the spatial clustering cost $r$, and the maximum cluster displacement $δ$ between consecutive time steps. We consider spatial clustering costs which generalize the well-studied $k$-center, discrete $k$-median, and discrete $k$-means objectives of classical clustering problems. We develop new algorithms that achieve trade-offs between the three objectives $k$, $r$, and $δ$. Our upper bounds are complemented by inapproximability results. △ Less

Submitted 19 April, 2017; originally announced April 2017.

Comments: 27 pages, 10 figures

ACM Class: F.2.2; I.5.3

arXiv:1703.07387 [pdf, other]

Topological Analysis of Nerves, Reeb Spaces, Mappers, and Multiscale Mappers

Authors: Tamal K. Dey, Facundo Memoli, Yusu Wang

Abstract: Data analysis often concerns not only the space where data come from, but also various types of maps attached to data. In recent years, several related structures have been used to study maps on data, including Reeb spaces, mappers and multiscale mappers. The construction of these structures also relies on the so-called \emph{nerve} of a cover of the domain. In this paper, we aim to analyze the… ▽ More Data analysis often concerns not only the space where data come from, but also various types of maps attached to data. In recent years, several related structures have been used to study maps on data, including Reeb spaces, mappers and multiscale mappers. The construction of these structures also relies on the so-called \emph{nerve} of a cover of the domain. In this paper, we aim to analyze the topological information encoded in these structures in order to provide better understanding of these structures and facilitate their practical usage. More specifically, we show that the one-dimensional homology of the nerve complex $N(\mathcal{U})$ of a path-connected cover $\mathcal{U}$ of a domain $X$ cannot be richer than that of the domain $X$ itself. Intuitively, this result means that no new $H_1$-homology class can be "created" under a natural map from $X$ to the nerve complex $N(\mathcal{U})$. Equipping $X$ with a pseudometric $d$, we further refine this result and characterize the classes of $H_1(X)$ that may survive in the nerve complex using the notion of \emph{size} of the covering elements in $\mathcal{U}$. These fundamental results about nerve complexes then lead to an analysis of the $H_1$-homology of Reeb spaces, mappers and multiscale mappers. The analysis of $H_1$-homology groups unfortunately does not extend to higher dimensions. Nevertheless, by using a map-induced metric, establishing a Gromov-Hausdorff convergence result between mappers and the domain, and interleaving relevant modules, we can still analyze the persistent homology groups of (multiscale) mappers to establish a connection to Reeb spaces. △ Less

Submitted 21 March, 2017; originally announced March 2017.

Comments: Full version of the paper appearing in International Symposium on Computational Geometry, 2017

arXiv:1609.07517 [pdf, other]

doi 10.4230/LIPIcs.ESA.2016.35

SimBa: An Efficient Tool for Approximating Rips-filtration Persistence via Simplicial Batch-collapse

Authors: Tamal K. Dey, Dayu Shi, Yusu Wang

Abstract: In topological data analysis, a point cloud data P extracted from a metric space is often analyzed by computing the persistence diagram or barcodes of a sequence of Rips complexes built on $P$ indexed by a scale parameter. Unfortunately, even for input of moderate size, the size of the Rips complex may become prohibitively large as the scale parameter increases. Starting with the Sparse Rips filtr… ▽ More In topological data analysis, a point cloud data P extracted from a metric space is often analyzed by computing the persistence diagram or barcodes of a sequence of Rips complexes built on $P$ indexed by a scale parameter. Unfortunately, even for input of moderate size, the size of the Rips complex may become prohibitively large as the scale parameter increases. Starting with the Sparse Rips filtration introduced by Sheehy, some existing methods aim to reduce the size of the complex so as to improve the time efficiency as well. However, as we demonstrate, existing approaches still fall short of scaling well, especially for high dimensional data. In this paper, we investigate the advantages and limitations of existing approaches. Based on insights gained from the experiments, we propose an efficient new algorithm, called SimBa, for approximating the persistent homology of Rips filtrations with quality guarantees. Our new algorithm leverages a batch collapse strategy as well as a new sparse Rips-like filtration. We experiment on a variety of low and high dimensional data sets. We show that our strategy presents a significant size reduction, and our algorithm for approximating Rips filtration persistence is order of magnitude faster than existing methods in practice. △ Less

Submitted 23 September, 2016; originally announced September 2016.

Comments: 15 pages, LIPIcs-Leibniz International Proceedings in Informatics. Vol. 57. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016

ACM Class: F.2.2

arXiv:1511.05479 [pdf, other]

Declutter and Resample: Towards parameter free denoising

Authors: Mickaël Buchet, Tamal K. Dey, Jiayuan Wang, Yusu Wang

Abstract: In many data analysis applications the following scenario is commonplace: we are given a point set that is supposed to sample a hidden ground truth $K$ in a metric space, but it got corrupted with noise so that some of the data points lie far away from $K$ creating outliers also termed as {\em ambient noise}. One of the main goals of denoising algorithms is to eliminate such noise so that the cura… ▽ More In many data analysis applications the following scenario is commonplace: we are given a point set that is supposed to sample a hidden ground truth $K$ in a metric space, but it got corrupted with noise so that some of the data points lie far away from $K$ creating outliers also termed as {\em ambient noise}. One of the main goals of denoising algorithms is to eliminate such noise so that the curated data lie within a bounded Hausdorff distance of $K$. Popular denoising approaches such as deconvolution and thresholding often require the user to set several parameters and/or to choose an appropriate noise model while guaranteeing only asymptotic convergence. Our goal is to lighten this burden as much as possible while ensuring theoretical guarantees in all cases. Specifically, first, we propose a simple denoising algorithm that requires only a single parameter but provides a theoretical guarantee on the quality of the output on general input points. We argue that this single parameter cannot be avoided. We next present a simple algorithm that avoids even this parameter by paying for it with a slight strengthening of the sampling condition on the input points which is not unrealistic. We also provide some preliminary empirical evidence that our algorithms are effective in practice. △ Less

Submitted 26 March, 2017; v1 submitted 17 November, 2015; originally announced November 2015.

arXiv:1505.06462 [pdf, other]

Parameter-free Topology Inference and Sparsification for Data on Manifolds

Authors: Tamal K. Dey, Zhe Dong, Yusu Wang

Abstract: In topology inference from data, current approaches face two major problems. One concerns the selection of a correct parameter to build an appropriate complex on top of the data points; the other involves with the typical `large' size of this complex. We address these two issues in the context of inferring homology from sample points of a smooth manifold of known dimension sitting in an Euclidean… ▽ More In topology inference from data, current approaches face two major problems. One concerns the selection of a correct parameter to build an appropriate complex on top of the data points; the other involves with the typical `large' size of this complex. We address these two issues in the context of inferring homology from sample points of a smooth manifold of known dimension sitting in an Euclidean space $\mathbb{R}^k$. We show that, for a sample size of $n$ points, we can identify a set of $O(n^2)$ points (as opposed to $O(n^{\lceil \frac{k}{2}\rceil})$ Voronoi vertices) approximating a subset of the medial axis that suffices to compute a distance sandwiched between the well known local feature size and the local weak feature size (in fact, the approximating set can be further reduced in size to $O(n)$). This distance, called the lean feature size, helps pruning the input set at least to the level of local feature size while making the data locally uniform. The local uniformity in turn helps in building a complex for homology inference on top of the sparsified data without requiring any user-supplied distance threshold. Unlike most topology inference results, ours does not require that the input is dense relative to a {\em global} feature such as {\em reach} or {\em weak feature size}; instead it can be adaptive with respect to the local feature size. We present some empirical evidence in support of our theoretical claims. △ Less

Submitted 24 May, 2015; originally announced May 2015.

arXiv:1504.03763 [pdf, other]

Mutiscale Mapper: A Framework for Topological Summarization of Data and Maps

Authors: Tamal K. Dey, Facundo Memoli, Yusu Wang

Abstract: Summarizing topological information from datasets and maps defined on them is a central theme in topological data analysis. \textsf{Mapper}, a tool for such summarization, takes as input both a possibly high dimensional dataset and a map defined on the data, and produces a summary of the data by using a cover of the codomain of the map. This cover, via a pullback operation to the domain, produces… ▽ More Summarizing topological information from datasets and maps defined on them is a central theme in topological data analysis. \textsf{Mapper}, a tool for such summarization, takes as input both a possibly high dimensional dataset and a map defined on the data, and produces a summary of the data by using a cover of the codomain of the map. This cover, via a pullback operation to the domain, produces a simplicial complex connecting the data points. The resulting view of the data through a cover of the codomain offers flexibility in analyzing the data. However, it offers only a view at a fixed scale at which the cover is constructed. Inspired by the concept, we explore a notion of a tower of covers which induces a tower of simplicial complexes connected by simplicial maps, which we call {\em multiscale mapper}. We study the resulting structure, its stability, and design practical algorithms to compute its associated persistence diagrams efficiently. Specifically, when the domain is a simplicial complex and the map is a real-valued piecewise-linear function, the algorithm can compute the exact persistence diagram only from the 1-skeleton of the input complex. For general maps, we present a combinatorial version of the algorithm that acts only on \emph{vertex sets} connected by the 1-skeleton graph, and this algorithm approximates the exact persistence diagram thanks to a stability result that we show to hold. We also relate the multiscale mapper with the Čech complexes arising from a natural pullback pseudometric defined on the input domain. △ Less

Submitted 12 January, 2016; v1 submitted 14 April, 2015; originally announced April 2015.

arXiv:1503.07414 [pdf, other]

Comparing Graphs via Persistence Distortion

Authors: Tamal K. Dey, Dayu Shi, Yusu Wang

Abstract: Metric graphs are ubiquitous in science and engineering. For example, many data are drawn from hidden spaces that are graph-like, such as the cosmic web. A metric graph offers one of the simplest yet still meaningful ways to represent the non-linear structure hidden behind the data. In this paper, we propose a new distance between two finite metric graphs, called the persistence-distortion distanc… ▽ More Metric graphs are ubiquitous in science and engineering. For example, many data are drawn from hidden spaces that are graph-like, such as the cosmic web. A metric graph offers one of the simplest yet still meaningful ways to represent the non-linear structure hidden behind the data. In this paper, we propose a new distance between two finite metric graphs, called the persistence-distortion distance, which draws upon a topological idea. This topological perspective along with the metric space viewpoint provide a new angle to the graph matching problem. Our persistence-distortion distance has two properties not shared by previous methods: First, it is stable against the perturbations of the input graph metrics. Second, it is a continuous distance measure, in the sense that it is defined on an alignment of the underlying spaces of input graphs, instead of merely their nodes. This makes our persistence-distortion distance robust against, for example, different discretizations of the same underlying graph. Despite considering the input graphs as continuous spaces, that is, taking all points into account, we show that we can compute the persistence-distortion distance in polynomial time. The time complexity for the discrete case where only graph nodes are considered is much faster. We also provide some preliminary experimental results to demonstrate the use of the new distance measure. △ Less

Submitted 1 December, 2017; v1 submitted 25 March, 2015; originally announced March 2015.

Showing 1–50 of 64 results for author: Dey, T K