-
Synthesis and Analysis of Data as Probability Measures with Entropy-Regularized Optimal Transport
Authors:
Brendan Mallery,
James M. Murphy,
Shuchin Aeron
Abstract:
We consider synthesis and analysis of probability measures using the entropy-regularized Wasserstein-2 cost and its unbiased version, the Sinkhorn divergence. The synthesis problem consists of computing the barycenter, with respect to these costs, of reference measures given a set of coefficients belonging to the simplex. The analysis problem consists of finding the coefficients for the closest ba…
▽ More
We consider synthesis and analysis of probability measures using the entropy-regularized Wasserstein-2 cost and its unbiased version, the Sinkhorn divergence. The synthesis problem consists of computing the barycenter, with respect to these costs, of reference measures given a set of coefficients belonging to the simplex. The analysis problem consists of finding the coefficients for the closest barycenter in the Wasserstein-2 distance to a given measure. Under the weakest assumptions on the measures thus far in the literature, we compute the derivative of the entropy-regularized Wasserstein-2 cost. We leverage this to establish a characterization of barycenters with respect to the entropy-regularized Wasserstein-2 cost as solutions that correspond to a fixed point of an average of the entropy-regularized displacement maps. This characterization yields a finite-dimensional, convex, quadratic program for solving the analysis problem when the measure being analyzed is a barycenter with respect to the entropy-regularized Wasserstein-2 cost. We show that these coefficients, as well as the value of the barycenter functional, can be estimated from samples with dimension-independent rates of convergence, and that barycentric coefficients are stable with respect to perturbations in the Wasserstein-2 metric. We employ the barycentric coefficients as features for classification of corrupted point cloud data, and show that compared to neural network baselines, our approach is more efficient in small training data regimes.
△ Less
Submitted 23 March, 2025; v1 submitted 13 January, 2025;
originally announced January 2025.
-
Linearized Wasserstein Barycenters: Synthesis, Analysis, Representational Capacity, and Applications
Authors:
Matthew Werenski,
Brendan Mallery,
Shuchin Aeron,
James M. Murphy
Abstract:
We propose the linear barycentric coding model (LBCM) which utilizes the linear optimal transport (LOT) metric for analysis and synthesis of probability measures. We provide a closed-form solution to the variational problem characterizing the probability measures in the LBCM and establish equivalence of the LBCM to the set of 2-Wasserstein barycenters in the special case of compatible measures. Co…
▽ More
We propose the linear barycentric coding model (LBCM) which utilizes the linear optimal transport (LOT) metric for analysis and synthesis of probability measures. We provide a closed-form solution to the variational problem characterizing the probability measures in the LBCM and establish equivalence of the LBCM to the set of 2-Wasserstein barycenters in the special case of compatible measures. Computational methods for synthesizing and analyzing measures in the LBCM are developed with finite sample guarantees. One of our main theoretical contributions is to identify an LBCM, expressed in terms of a simple family, which is sufficient to express all probability measures on the closed unit interval. We show that a natural analogous construction of an LBCM in 2 dimensions fails, and we leave it as an open problem to identify the proper extension in more than 1 dimension. We conclude by demonstrating the utility of LBCM for covariance estimation and data imputation.
△ Less
Submitted 7 April, 2025; v1 submitted 30 October, 2024;
originally announced October 2024.
-
A Lattice-Theoretic Perspective on the Persistence Map
Authors:
Brendan Mallery,
Adélie Garin,
Justin Curry
Abstract:
We provide a naturally isomorphic description of the persistence map from merge trees to barcodes in terms of a monotone map from the partition lattice to the subset lattice. Our description is local, which offers the potential to speed up inverse computations, and brings classical tools in combinatorics to bear on an active area of research in topological data analysis (TDA).
We provide a naturally isomorphic description of the persistence map from merge trees to barcodes in terms of a monotone map from the partition lattice to the subset lattice. Our description is local, which offers the potential to speed up inverse computations, and brings classical tools in combinatorics to bear on an active area of research in topological data analysis (TDA).
△ Less
Submitted 1 March, 2022;
originally announced March 2022.