-
Distributed Computation of Persistent Cohomology
Authors:
Arnur Nigmetov,
Dmitriy Morozov
Abstract:
Persistent (co)homology is a central construction in topological data analysis, where it is used to quantify prominence of features in data to produce stable descriptors suitable for downstream analysis. Persistence is challenging to compute in parallel because it relies on global connectivity of the data. We propose a new algorithm to compute persistent cohomology in the distributed setting. It c…
▽ More
Persistent (co)homology is a central construction in topological data analysis, where it is used to quantify prominence of features in data to produce stable descriptors suitable for downstream analysis. Persistence is challenging to compute in parallel because it relies on global connectivity of the data. We propose a new algorithm to compute persistent cohomology in the distributed setting. It combines domain and range partitioning. The former is used to reduce and sparsify the coboundary matrix locally. After this initial local reduction, we redistribute the matrix across processors for the global reduction. We experimentally compare our cohomology algorithm with DIPHA, the only publicly available code for distributed computation of persistent (co)homology; our algorithm demonstrates a significant improvement in strong scaling.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Wilkins: HPC In Situ Workflows Made Easy
Authors:
Orcun Yildiz,
Dmitriy Morozov,
Arnur Nigmetov,
Bogdan Nicolae,
Tom Peterka
Abstract:
In situ approaches can accelerate the pace of scientific discoveries by allowing scientists to perform data analysis at simulation time. Current in situ workflow systems, however, face challenges in handling the growing complexity and diverse computational requirements of scientific tasks. In this work, we present Wilkins, an in situ workflow system that is designed for ease-of-use while providing…
▽ More
In situ approaches can accelerate the pace of scientific discoveries by allowing scientists to perform data analysis at simulation time. Current in situ workflow systems, however, face challenges in handling the growing complexity and diverse computational requirements of scientific tasks. In this work, we present Wilkins, an in situ workflow system that is designed for ease-of-use while providing scalable and efficient execution of workflow tasks. Wilkins provides a flexible workflow description interface, employs a high-performance data transport layer based on HDF5, and supports tasks with disparate data rates by providing a flow control mechanism. Wilkins seamlessly couples scientific tasks that already use HDF5, without requiring task code modifications. We demonstrate the above features using both synthetic benchmarks and two science use cases in materials science and cosmology.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Robustifying State-space Models for Long Sequences via Approximate Diagonalization
Authors:
Annan Yu,
Arnur Nigmetov,
Dmitriy Morozov,
Michael W. Mahoney,
N. Benjamin Erichson
Abstract:
State-space models (SSMs) have recently emerged as a framework for learning long-range sequence tasks. An example is the structured state-space sequence (S4) layer, which uses the diagonal-plus-low-rank structure of the HiPPO initialization framework. However, the complicated structure of the S4 layer poses challenges; and, in an effort to address these challenges, models such as S4D and S5 have c…
▽ More
State-space models (SSMs) have recently emerged as a framework for learning long-range sequence tasks. An example is the structured state-space sequence (S4) layer, which uses the diagonal-plus-low-rank structure of the HiPPO initialization framework. However, the complicated structure of the S4 layer poses challenges; and, in an effort to address these challenges, models such as S4D and S5 have considered a purely diagonal structure. This choice simplifies the implementation, improves computational efficiency, and allows channel communication. However, diagonalizing the HiPPO framework is itself an ill-posed problem. In this paper, we propose a general solution for this and related ill-posed diagonalization problems in machine learning. We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology, which is based on the pseudospectral theory of non-normal operators, and which may be interpreted as the approximate diagonalization of the non-normal matrices defining SSMs. Based on this, we introduce the S4-PTD and S5-PTD models. Through theoretical analysis of the transfer functions of different initialization schemes, we demonstrate that the S4-PTD/S5-PTD initialization strongly converges to the HiPPO framework, while the S4D/S5 initialization only achieves weak convergences. As a result, our new models show resilience to Fourier-mode noise-perturbed inputs, a crucial property not achieved by the S4D/S5 models. In addition to improved robustness, our S5-PTD model averages 87.6% accuracy on the Long-Range Arena benchmark, demonstrating that the PTD methodology helps to improve the accuracy of deep learning models.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Fast Merge Tree Computation via SYCL
Authors:
Arnur Nigmetov,
Dmitriy Morozov
Abstract:
A merge tree is a topological descriptor of a real-valued function. Merge trees are used in visualization and topological data analysis, either directly or as a means to another end: computing a 0-dimensional persistence diagram, identifying connected components, performing topological simplification, etc.
Scientific computing relies more and more on GPUs to achieve fast, scalable computation. F…
▽ More
A merge tree is a topological descriptor of a real-valued function. Merge trees are used in visualization and topological data analysis, either directly or as a means to another end: computing a 0-dimensional persistence diagram, identifying connected components, performing topological simplification, etc.
Scientific computing relies more and more on GPUs to achieve fast, scalable computation. For efficiency, data analysis should take place at the same location as the main computation, which motivates interest in parallel algorithms and portable software for merge trees that can run not only on a CPU, but also on a GPU, or other types of accelerators. The SYCL standard defines a programming model that allows the same code, written in standard C++, to compile targets for multiple parallel backends (CPUs via OpenMP or TBB, NVIDIA GPUs via CUDA, AMD GPUs via ROCm, Intel GPUs via Level Zero, FPGAs). In this paper, we adapt the triplet merge tree algorithm to SYCL and compare our implementation with the VTK-m implementation, which is the only other implementation of merge trees for GPUs that we know of.
△ Less
Submitted 27 January, 2023; v1 submitted 25 January, 2023;
originally announced January 2023.
-
Topological Optimization with Big Steps
Authors:
Arnur Nigmetov,
Dmitriy Morozov
Abstract:
Using persistent homology to guide optimization has emerged as a novel application of topological data analysis. Existing methods treat persistence calculation as a black box and backpropagate gradients only onto the simplices involved in particular pairs. We show how the cycles and chains used in the persistence calculation can be used to prescribe gradients to larger subsets of the domain. In pa…
▽ More
Using persistent homology to guide optimization has emerged as a novel application of topological data analysis. Existing methods treat persistence calculation as a black box and backpropagate gradients only onto the simplices involved in particular pairs. We show how the cycles and chains used in the persistence calculation can be used to prescribe gradients to larger subsets of the domain. In particular, we show that in a special case, which serves as a building block for general losses, the problem can be solved exactly in linear time. This relies on another contribution of this paper, which eliminates the need to examine a factorial number of permutations of simplices with the same value. We present empirical experiments that show the practical benefits of our algorithm: the number of steps required for the optimization is reduced by an order of magnitude.
△ Less
Submitted 2 November, 2023; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Topological Regularization via Persistence-Sensitive Optimization
Authors:
Arnur Nigmetov,
Aditi S. Krishnapriyan,
Nicole Sanderson,
Dmitriy Morozov
Abstract:
Optimization, a key tool in machine learning and statistics, relies on regularization to reduce overfitting. Traditional regularization methods control a norm of the solution to ensure its smoothness. Recently, topological methods have emerged as a way to provide a more precise and expressive control over the solution, relying on persistent homology to quantify and reduce its roughness. All such e…
▽ More
Optimization, a key tool in machine learning and statistics, relies on regularization to reduce overfitting. Traditional regularization methods control a norm of the solution to ensure its smoothness. Recently, topological methods have emerged as a way to provide a more precise and expressive control over the solution, relying on persistent homology to quantify and reduce its roughness. All such existing techniques back-propagate gradients through the persistence diagram, which is a summary of the topological features of a function. Their downside is that they provide information only at the critical points of the function. We propose a method that instead builds on persistence-sensitive simplification and translates the required changes to the persistence diagram into changes on large subsets of the domain, including both critical and regular points. This approach enables a faster and more precise topological regularization, the benefits of which we illustrate with experimental evidence.
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
Efficient Approximation of the Matching Distance for 2-parameter persistence
Authors:
Michael Kerber,
Arnur Nigmetov
Abstract:
The matching distance is a computationally tractable topological measure to compare multi-filtered simplicial complexes. We design efficient algorithms for approximating the matching distance of two bi-filtered complexes to any desired precision $ε>0$. Our approach is based on a quad-tree refinement strategy introduced by Biasotti et al., but we recast their approach entirely in geometric terms. T…
▽ More
The matching distance is a computationally tractable topological measure to compare multi-filtered simplicial complexes. We design efficient algorithms for approximating the matching distance of two bi-filtered complexes to any desired precision $ε>0$. Our approach is based on a quad-tree refinement strategy introduced by Biasotti et al., but we recast their approach entirely in geometric terms. This point of view leads to several novel observations resulting in a practically faster algorithm. We demonstrate this speed-up by experimental comparison and provide our code in a public repository which provides the first efficient publicly available implementation of the matching distance.
△ Less
Submitted 31 March, 2020; v1 submitted 12 December, 2019;
originally announced December 2019.
-
Metric Spaces with Expensive Distances
Authors:
Michael Kerber,
Arnur Nigmetov
Abstract:
In algorithms for finite metric spaces, it is common to assume that the distance between two points can be computed in constant time, and complexity bounds are expressed only in terms of the number of points of the metric space. We introduce a different model where we assume that the computation of a single distance is an expensive operation and consequently, the goal is to minimize the number of…
▽ More
In algorithms for finite metric spaces, it is common to assume that the distance between two points can be computed in constant time, and complexity bounds are expressed only in terms of the number of points of the metric space. We introduce a different model where we assume that the computation of a single distance is an expensive operation and consequently, the goal is to minimize the number of such distance queries. This model is motivated by metric spaces that appear in the context of topological data analysis.
We consider two standard operations on metric spaces, namely the construction of a $1+\varepsilon$-spanner and the computation of an approximate nearest neighbor for a given query point. In both cases, we partially explore the metric space through distance queries and infer lower and upper bounds for yet unexplored distances through triangle inequality. For spanners, we evaluate several exploration strategies through extensive experimental evaluation. For approximate nearest neighbors, we prove that our strategy returns an approximate nearest neighbor after a logarithmic number of distance queries.
△ Less
Submitted 25 January, 2019;
originally announced January 2019.
-
Geometry Helps to Compare Persistence Diagrams
Authors:
Michael Kerber,
Dmitriy Morozov,
Arnur Nigmetov
Abstract:
Exploiting geometric structure to improve the asymptotic complexity of discrete assignment problems is a well-studied subject. In contrast, the practical advantages of using geometry for such problems have not been explored. We implement geometric variants of the Hopcroft--Karp algorithm for bottleneck matching (based on previous work by Efrat el al.) and of the auction algorithm by Bertsekas for…
▽ More
Exploiting geometric structure to improve the asymptotic complexity of discrete assignment problems is a well-studied subject. In contrast, the practical advantages of using geometry for such problems have not been explored. We implement geometric variants of the Hopcroft--Karp algorithm for bottleneck matching (based on previous work by Efrat el al.) and of the auction algorithm by Bertsekas for Wasserstein distance computation. Both implementations use k-d trees to replace a linear scan with a geometric proximity query. Our interest in this problem stems from the desire to compute distances between persistence diagrams, a problem that comes up frequently in topological data analysis. We show that our geometric matching algorithms lead to a substantial performance gain, both in running time and in memory consumption, over their purely combinatorial counterparts. Moreover, our implementation significantly outperforms the only other implementation available for comparing persistence diagrams.
△ Less
Submitted 10 June, 2016;
originally announced June 2016.