-
Subsampling, aligning, and averaging to find circular coordinates in recurrent time series
Authors:
Andrew J. Blumberg,
Mathieu Carrière,
Jun Hou Fung,
Michael A. Mandell
Abstract:
We introduce a new algorithm for finding robust circular coordinates on data that is expected to exhibit recurrence, such as that which appears in neuronal recordings of C. elegans. Techniques exist to create circular coordinates on a simplicial complex from a dimension 1 cohomology class, and these can be applied to the Rips complex of a dataset when it has a prominent class in its dimension 1 co…
▽ More
We introduce a new algorithm for finding robust circular coordinates on data that is expected to exhibit recurrence, such as that which appears in neuronal recordings of C. elegans. Techniques exist to create circular coordinates on a simplicial complex from a dimension 1 cohomology class, and these can be applied to the Rips complex of a dataset when it has a prominent class in its dimension 1 cohomology. However, it is known this approach is extremely sensitive to uneven sampling density.
Our algorithm comes with a new method to correct for uneven sampling density, adapting our prior work on averaging coordinates in manifold learning. We use rejection sampling to correct for inhomogeneous sampling and then apply Procrustes matching to align and average the subsamples. In addition to providing a more robust coordinate than other approaches, this subsampling and averaging approach has better efficiency.
We validate our technique on both synthetic data sets and neuronal activity recordings. Our results reveal a topological model of neuronal trajectories for C. elegans that is constructed from loops in which different regions of the brain state space can be mapped to specific and interpretable macroscopic behaviors in the worm.
△ Less
Submitted 24 December, 2024;
originally announced December 2024.
-
Resampling and averaging coordinates on data
Authors:
Andrew J. Blumberg,
Mathieu Carriere,
Jun Hou Fung,
Michael A. Mandell
Abstract:
We introduce algorithms for robustly computing intrinsic coordinates on point clouds. Our approach relies on generating many candidate coordinates by subsampling the data and varying hyperparameters of the embedding algorithm (e.g., manifold learning). We then identify a subset of representative embeddings by clustering the collection of candidate coordinates and using shape descriptors from topol…
▽ More
We introduce algorithms for robustly computing intrinsic coordinates on point clouds. Our approach relies on generating many candidate coordinates by subsampling the data and varying hyperparameters of the embedding algorithm (e.g., manifold learning). We then identify a subset of representative embeddings by clustering the collection of candidate coordinates and using shape descriptors from topological data analysis. The final output is the embedding obtained as an average of the representative embeddings using generalized Procrustes analysis. We validate our algorithm on both synthetic data and experimental measurements from genomics, demonstrating robustness to noise and outliers.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
An Intrinsic Approach to Scalar-Curvature Estimation for Point Clouds
Authors:
Abigail Hickok,
Andrew J. Blumberg
Abstract:
We introduce an intrinsic estimator for the scalar curvature of a data set presented as a finite metric space. Our estimator depends only on the metric structure of the data and not on an embedding in $\mathbb{R}^n$. We show that the estimator is consistent in the sense that for points sampled from a probability measure on a compact Riemannian manifold, the estimator converges to the scalar curvat…
▽ More
We introduce an intrinsic estimator for the scalar curvature of a data set presented as a finite metric space. Our estimator depends only on the metric structure of the data and not on an embedding in $\mathbb{R}^n$. We show that the estimator is consistent in the sense that for points sampled from a probability measure on a compact Riemannian manifold, the estimator converges to the scalar curvature as the number of points increases. To justify its use in applications, we show that the estimator is stable with respect to perturbations of the metric structure, e.g., noise in the sample or error estimating the intrinsic metric. We validate our estimator experimentally on synthetic data that is sampled from manifolds with specified curvature.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
MREC: a fast and versatile framework for aligning and matching point clouds with applications to single cell molecular data
Authors:
Andrew J. Blumberg,
Mathieu Carriere,
Michael A. Mandell,
Raul Rabadan,
Soledad Villar
Abstract:
Comparing and aligning large datasets is a pervasive problem occurring across many different knowledge domains. We introduce and study MREC, a recursive decomposition algorithm for computing matchings between data sets. The basic idea is to partition the data, match the partitions, and then recursively match the points within each pair of identified partitions. The matching itself is done using bl…
▽ More
Comparing and aligning large datasets is a pervasive problem occurring across many different knowledge domains. We introduce and study MREC, a recursive decomposition algorithm for computing matchings between data sets. The basic idea is to partition the data, match the partitions, and then recursively match the points within each pair of identified partitions. The matching itself is done using black box matching procedures that are too expensive to run on the entire data set. Using an absolute measure of the quality of a matching, the framework supports optimization over parameters including partitioning procedures and matching algorithms. By design, MREC can be applied to extremely large data sets. We analyze the procedure to describe when we can expect it to work well and demonstrate its flexibility and power by applying it to a number of alignment problems arising in the analysis of single cell molecular data.
△ Less
Submitted 20 February, 2020; v1 submitted 6 January, 2020;
originally announced January 2020.
-
Testing to distinguish measures on metric spaces
Authors:
Andrew J. Blumberg,
Prithwish Bhaumik,
Stephen G. Walker
Abstract:
We study the problem of distinguishing between two distributions on a metric space; i.e., given metric measure spaces $({\mathbb X}, d, μ_1)$ and $({\mathbb X}, d, μ_2)$, we are interested in the problem of determining from finite data whether or not $μ_1$ is $μ_2$. The key is to use pairwise distances between observations and, employing a reconstruction theorem of Gromov, we can perform such a te…
▽ More
We study the problem of distinguishing between two distributions on a metric space; i.e., given metric measure spaces $({\mathbb X}, d, μ_1)$ and $({\mathbb X}, d, μ_2)$, we are interested in the problem of determining from finite data whether or not $μ_1$ is $μ_2$. The key is to use pairwise distances between observations and, employing a reconstruction theorem of Gromov, we can perform such a test using a two sample Kolmogorov--Smirnov test. A real analysis using phylogenetic trees and flu data is presented.
△ Less
Submitted 4 February, 2018;
originally announced February 2018.
-
A polynomial-time relaxation of the Gromov-Hausdorff distance
Authors:
Soledad Villar,
Afonso S. Bandeira,
Andrew J. Blumberg,
Rachel Ward
Abstract:
The Gromov-Hausdorff distance provides a metric on the set of isometry classes of compact metric spaces. Unfortunately, computing this metric directly is believed to be computationally intractable. Motivated by applications in shape matching and point-cloud comparison, we study a semidefinite programming relaxation of the Gromov-Hausdorff metric. This relaxation can be computed in polynomial time,…
▽ More
The Gromov-Hausdorff distance provides a metric on the set of isometry classes of compact metric spaces. Unfortunately, computing this metric directly is believed to be computationally intractable. Motivated by applications in shape matching and point-cloud comparison, we study a semidefinite programming relaxation of the Gromov-Hausdorff metric. This relaxation can be computed in polynomial time, and somewhat surprisingly is itself a pseudometric. We describe the induced topology on the set of compact metric spaces. Finally, we demonstrate the numerical performance of various algorithms for computing the relaxed distance and apply these algorithms to several relevant data sets. In particular we propose a greedy algorithm for finding the best correspondence between finite metric spaces that can handle hundreds of points.
△ Less
Submitted 18 October, 2016; v1 submitted 17 October, 2016;
originally announced October 2016.