-
An Efficient Transport-Based Dissimilarity Measure for Time Series Classification under Warping Distortions
Authors:
Akram Aldroubi,
Rocío Díaz Martín,
Ivan Medri,
Kristofor E. Pas,
Gustavo K. Rohde,
Abu Hasnat Mohammad Rubaiyat
Abstract:
Time Series Classification (TSC) is an important problem with numerous applications in science and technology. Dissimilarity-based approaches, such as Dynamic Time Warping (DTW), are classical methods for distinguishing time series when time deformations are confounding information. In this paper, starting from a deformation-based model for signal classes we define a problem statement for time ser…
▽ More
Time Series Classification (TSC) is an important problem with numerous applications in science and technology. Dissimilarity-based approaches, such as Dynamic Time Warping (DTW), are classical methods for distinguishing time series when time deformations are confounding information. In this paper, starting from a deformation-based model for signal classes we define a problem statement for time series classification problem. We show that, under theoretically ideal conditions, a continuous version of classic 1NN-DTW method can solve the stated problem, even when only one training sample is available. In addition, we propose an alternative dissimilarity measure based on Optimal Transport and show that it can also solve the aforementioned problem statement at a significantly reduced computational cost. Finally, we demonstrate the application of the newly proposed approach in simulated and real time series classification data, showing the efficacy of the method.
△ Less
Submitted 14 May, 2025; v1 submitted 8 May, 2025;
originally announced May 2025.
-
Linear Spherical Sliced Optimal Transport: A Fast Metric for Comparing Spherical Data
Authors:
Xinran Liu,
Yikun Bai,
Rocío Díaz Martín,
Kaiwen Shi,
Ashkan Shahbazi,
Bennett A. Landman,
Catie Chang,
Soheil Kolouri
Abstract:
Efficient comparison of spherical probability distributions becomes important in fields such as computer vision, geosciences, and medicine. Sliced optimal transport distances, such as spherical and stereographic spherical sliced Wasserstein distances, have recently been developed to address this need. These methods reduce the computational burden of optimal transport by slicing hyperspheres into o…
▽ More
Efficient comparison of spherical probability distributions becomes important in fields such as computer vision, geosciences, and medicine. Sliced optimal transport distances, such as spherical and stereographic spherical sliced Wasserstein distances, have recently been developed to address this need. These methods reduce the computational burden of optimal transport by slicing hyperspheres into one-dimensional projections, i.e., lines or circles. Concurrently, linear optimal transport has been proposed to embed distributions into \( L^2 \) spaces, where the \( L^2 \) distance approximates the optimal transport distance, thereby simplifying comparisons across multiple distributions. In this work, we introduce the Linear Spherical Sliced Optimal Transport (LSSOT) framework, which utilizes slicing to embed spherical distributions into \( L^2 \) spaces while preserving their intrinsic geometry, offering a computationally efficient metric for spherical probability measures. We establish the metricity of LSSOT and demonstrate its superior computational efficiency in applications such as cortical surface registration, 3D point cloud interpolation via gradient flow, and shape embedding. Our results demonstrate the significant computational benefits and high accuracy of LSSOT in these applications.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
Linear Partial Gromov-Wasserstein Embedding
Authors:
Yikun Bai,
Abihith Kothapalli,
Hengrong Du,
Rocio Diaz Martin,
Soheil Kolouri
Abstract:
The Gromov-Wasserstein (GW) problem, a variant of the classical optimal transport (OT) problem, has attracted growing interest in the machine learning and data science communities due to its ability to quantify similarity between measures in different metric spaces. However, like the classical OT problem, GW imposes an equal mass constraint between measures, which restricts its application in many…
▽ More
The Gromov-Wasserstein (GW) problem, a variant of the classical optimal transport (OT) problem, has attracted growing interest in the machine learning and data science communities due to its ability to quantify similarity between measures in different metric spaces. However, like the classical OT problem, GW imposes an equal mass constraint between measures, which restricts its application in many machine learning tasks. To address this limitation, the partial Gromov-Wasserstein (PGW) problem has been introduced. It relaxes the equal mass constraint, allowing the comparison of general positive Radon measures. Despite this, both GW and PGW face significant computational challenges due to their non-convex nature. To overcome these challenges, we propose the linear partial Gromov-Wasserstein (LPGW) embedding, a linearized embedding technique for the PGW problem. For $K$ different metric measure spaces, the pairwise computation of the PGW distance requires solving the PGW problem ${O}(K^2)$ times. In contrast, the proposed linearization technique reduces this to ${O}(K)$ times. Similar to the linearization technique for the classical OT problem, we prove that LPGW defines a valid metric for metric measure spaces. Finally, we demonstrate the effectiveness of LPGW in practical applications such as shape retrieval and learning with transport-based embeddings, showing that LPGW preserves the advantages of PGW in partial matching while significantly enhancing computational efficiency. The code is available at https://github.com/mint-vu/Linearized_Partial_Gromov_Wasserstein.
△ Less
Submitted 22 March, 2025; v1 submitted 21 October, 2024;
originally announced October 2024.
-
Expected Sliced Transport Plans
Authors:
Xinran Liu,
Rocío Díaz Martín,
Yikun Bai,
Ashkan Shahbazi,
Matthew Thorpe,
Akram Aldroubi,
Soheil Kolouri
Abstract:
The optimal transport (OT) problem has gained significant traction in modern machine learning for its ability to: (1) provide versatile metrics, such as Wasserstein distances and their variants, and (2) determine optimal couplings between probability measures. To reduce the computational complexity of OT solvers, methods like entropic regularization and sliced optimal transport have been proposed.…
▽ More
The optimal transport (OT) problem has gained significant traction in modern machine learning for its ability to: (1) provide versatile metrics, such as Wasserstein distances and their variants, and (2) determine optimal couplings between probability measures. To reduce the computational complexity of OT solvers, methods like entropic regularization and sliced optimal transport have been proposed. The sliced OT framework improves efficiency by comparing one-dimensional projections (slices) of high-dimensional distributions. However, despite their computational efficiency, sliced-Wasserstein approaches lack a transportation plan between the input measures, limiting their use in scenarios requiring explicit coupling. In this paper, we address two key questions: Can a transportation plan be constructed between two probability measures using the sliced transport framework? If so, can this plan be used to define a metric between the measures? We propose a "lifting" operation to extend one-dimensional optimal transport plans back to the original space of the measures. By computing the expectation of these lifted plans, we derive a new transportation plan, termed expected sliced transport (EST) plans. We prove that using the EST plan to weight the sum of the individual Euclidean costs for moving from one point to another results in a valid metric between the input discrete probability measures. We demonstrate the connection between our approach and the recently proposed min-SWGG, along with illustrative numerical examples that support our theoretical findings.
△ Less
Submitted 17 October, 2024; v1 submitted 15 October, 2024;
originally announced October 2024.
-
Partial Gromov-Wasserstein Metric
Authors:
Yikun Bai,
Rocio Diaz Martin,
Abihith Kothapalli,
Hengrong Du,
Xinran Liu,
Soheil Kolouri
Abstract:
The Gromov-Wasserstein (GW) distance has gained increasing interest in the machine learning community in recent years, as it allows for the comparison of measures in different metric spaces. To overcome the limitations imposed by the equal mass requirements of the classical GW problem, researchers have begun exploring its application in unbalanced settings. However, Unbalanced GW (UGW) can only be…
▽ More
The Gromov-Wasserstein (GW) distance has gained increasing interest in the machine learning community in recent years, as it allows for the comparison of measures in different metric spaces. To overcome the limitations imposed by the equal mass requirements of the classical GW problem, researchers have begun exploring its application in unbalanced settings. However, Unbalanced GW (UGW) can only be regarded as a discrepancy rather than a rigorous metric/distance between two metric measure spaces (mm-spaces). In this paper, we propose a particular case of the UGW problem, termed Partial Gromov-Wasserstein (PGW). We establish that PGW is a well-defined metric between mm-spaces and discuss its theoretical properties, including the existence of a minimizer for the PGW problem and the relationship between PGW and GW, among others. We then propose two variants of the Frank-Wolfe algorithm for solving the PGW problem and show that they are mathematically and computationally equivalent. Moreover, based on our PGW metric, we introduce the analogous concept of barycenters for mm-spaces. Finally, we validate the effectiveness of our PGW metric and related solvers in applications such as shape matching, shape retrieval, and shape interpolation, comparing them against existing baselines. Our code is available at https://github.com/mint-vu/PGW_Metric.
△ Less
Submitted 27 March, 2025; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Stereographic Spherical Sliced Wasserstein Distances
Authors:
Huy Tran,
Yikun Bai,
Abihith Kothapalli,
Ashkan Shahbazi,
Xinran Liu,
Rocio Diaz Martin,
Soheil Kolouri
Abstract:
Comparing spherical probability distributions is of great interest in various fields, including geology, medical domains, computer vision, and deep representation learning. The utility of optimal transport-based distances, such as the Wasserstein distance, for comparing probability measures has spurred active research in developing computationally efficient variations of these distances for spheri…
▽ More
Comparing spherical probability distributions is of great interest in various fields, including geology, medical domains, computer vision, and deep representation learning. The utility of optimal transport-based distances, such as the Wasserstein distance, for comparing probability measures has spurred active research in developing computationally efficient variations of these distances for spherical probability measures. This paper introduces a high-speed and highly parallelizable distance for comparing spherical measures using the stereographic projection and the generalized Radon transform, which we refer to as the Stereographic Spherical Sliced Wasserstein (S3W) distance. We carefully address the distance distortion caused by the stereographic projection and provide an extensive theoretical analysis of our proposed metric and its rotationally invariant variation. Finally, we evaluate the performance of the proposed metrics and compare them with recent baselines in terms of both speed and accuracy through a wide range of numerical studies, including gradient flows and self-supervised learning. Our code is available at https://github.com/mint-vu/s3wd.
△ Less
Submitted 9 June, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
LCOT: Linear circular optimal transport
Authors:
Rocio Diaz Martin,
Ivan Medri,
Yikun Bai,
Xinran Liu,
Kangbai Yan,
Gustavo K. Rohde,
Soheil Kolouri
Abstract:
The optimal transport problem for measures supported on non-Euclidean spaces has recently gained ample interest in diverse applications involving representation learning. In this paper, we focus on circular probability measures, i.e., probability measures supported on the unit circle, and introduce a new computationally efficient metric for these measures, denoted as Linear Circular Optimal Transp…
▽ More
The optimal transport problem for measures supported on non-Euclidean spaces has recently gained ample interest in diverse applications involving representation learning. In this paper, we focus on circular probability measures, i.e., probability measures supported on the unit circle, and introduce a new computationally efficient metric for these measures, denoted as Linear Circular Optimal Transport (LCOT). The proposed metric comes with an explicit linear embedding that allows one to apply Machine Learning (ML) algorithms to the embedded measures and seamlessly modify the underlying metric for the ML algorithm to LCOT. We show that the proposed metric is rooted in the Circular Optimal Transport (COT) and can be considered the linearization of the COT metric with respect to a fixed reference measure. We provide a theoretical analysis of the proposed metric and derive the computational complexities for pairwise comparison of circular probability measures. Lastly, through a set of numerical experiments, we demonstrate the benefits of LCOT in learning representations of circular measures.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Majorana Demonstrator Data Release for AI/ML Applications
Authors:
I. J. Arnquist,
F. T. Avignone III,
A. S. Barabash,
C. J. Barton,
K. H. Bhimani,
E. Blalock,
B. Bos,
M. Busch,
M. Buuck,
T. S. Caldwell,
Y. -D. Chan,
C. D. Christofferson,
P. -H. Chu,
M. L. Clark,
C. Cuesta,
J. A. Detwiler,
Yu. Efremenko,
H. Ejiri,
S. R. Elliott,
N. Fuad,
G. K. Giovanetti,
M. P. Green,
J. Gruszko,
I. S. Guinn,
V. E. Guiseppe
, et al. (35 additional authors not shown)
Abstract:
The enclosed data release consists of a subset of the calibration data from the Majorana Demonstrator experiment. Each Majorana event is accompanied by raw Germanium detector waveforms, pulse shape discrimination cuts, and calibrated final energies, all shared in an HDF5 file format along with relevant metadata. This release is specifically designed to support the training and testing of Artificia…
▽ More
The enclosed data release consists of a subset of the calibration data from the Majorana Demonstrator experiment. Each Majorana event is accompanied by raw Germanium detector waveforms, pulse shape discrimination cuts, and calibrated final energies, all shared in an HDF5 file format along with relevant metadata. This release is specifically designed to support the training and testing of Artificial Intelligence (AI) and Machine Learning (ML) algorithms upon our data. This document is structured as follows. Section I provides an overview of the dataset's content and format; Section II outlines the location of this dataset and the method for accessing it; Section III presents the NPML Machine Learning Challenge associated with this dataset; Section IV contains a disclaimer from the Majorana collaboration regarding the use of this dataset; Appendix A contains technical details of this data release. Please direct questions about the material provided within this release to [email protected] (A. Li).
△ Less
Submitted 14 September, 2023; v1 submitted 21 August, 2023;
originally announced August 2023.
-
Linear Optimal Partial Transport Embedding
Authors:
Yikun Bai,
Ivan Medri,
Rocio Diaz Martin,
Rana Muhammad Shahroz Khan,
Soheil Kolouri
Abstract:
Optimal transport (OT) has gained popularity due to its various applications in fields such as machine learning, statistics, and signal processing. However, the balanced mass requirement limits its performance in practical problems. To address these limitations, variants of the OT problem, including unbalanced OT, Optimal partial transport (OPT), and Hellinger Kantorovich (HK), have been proposed.…
▽ More
Optimal transport (OT) has gained popularity due to its various applications in fields such as machine learning, statistics, and signal processing. However, the balanced mass requirement limits its performance in practical problems. To address these limitations, variants of the OT problem, including unbalanced OT, Optimal partial transport (OPT), and Hellinger Kantorovich (HK), have been proposed. In this paper, we propose the Linear optimal partial transport (LOPT) embedding, which extends the (local) linearization technique on OT and HK to the OPT problem. The proposed embedding allows for faster computation of OPT distance between pairs of positive measures. Besides our theoretical contributions, we demonstrate the LOPT embedding technique in point-cloud interpolation and PCA analysis.
△ Less
Submitted 23 April, 2024; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Interpretable Boosted Decision Tree Analysis for the Majorana Demonstrator
Authors:
I. J. Arnquist,
F. T. Avignone III,
A. S. Barabash,
C. J. Barton,
K. H. Bhimani,
E. Blalock,
B. Bos,
M. Busch,
M. Buuck,
T. S. Caldwell,
Y -D. Chan,
C. D. Christofferson,
P. -H. Chu,
M. L. Clark,
C. Cuesta,
J. A. Detwiler,
Yu. Efremenko,
S. R. Elliott,
G. K. Giovanetti,
M. P. Green,
J. Gruszko,
I. S. Guinn,
V. E. Guiseppe,
C. R. Haufe,
R. Henning
, et al. (30 additional authors not shown)
Abstract:
The Majorana Demonstrator is a leading experiment searching for neutrinoless double-beta decay with high purity germanium detectors (HPGe). Machine learning provides a new way to maximize the amount of information provided by these detectors, but the data-driven nature makes it less interpretable compared to traditional analysis. An interpretability study reveals the machine's decision-making logi…
▽ More
The Majorana Demonstrator is a leading experiment searching for neutrinoless double-beta decay with high purity germanium detectors (HPGe). Machine learning provides a new way to maximize the amount of information provided by these detectors, but the data-driven nature makes it less interpretable compared to traditional analysis. An interpretability study reveals the machine's decision-making logic, allowing us to learn from the machine to feedback to the traditional analysis. In this work, we have presented the first machine learning analysis of the data from the Majorana Demonstrator; this is also the first interpretable machine learning analysis of any germanium detector experiment. Two gradient boosted decision tree models are trained to learn from the data, and a game-theory-based model interpretability study is conducted to understand the origin of the classification power. By learning from data, this analysis recognizes the correlations among reconstruction parameters to further enhance the background rejection performance. By learning from the machine, this analysis reveals the importance of new background categories to reciprocally benefit the standard Majorana analysis. This model is highly compatible with next-generation germanium detector experiments like LEGEND since it can be simultaneously trained on a large number of detectors.
△ Less
Submitted 21 August, 2024; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Signed Cumulative Distribution Transform for Parameter Estimation of 1-D Signals
Authors:
Sumati Thareja,
Gustavo Rohde,
Rocio Diaz Martin,
Ivan Medri,
Akram Aldroubi
Abstract:
We describe a method for signal parameter estimation using the signed cumulative distribution transform (SCDT), a recently introduced signal representation tool based on optimal transport theory. The method builds upon signal estimation using the cumulative distribution transform (CDT) originally introduced for positive distributions. Specifically, we show that Wasserstein-type distance minimizati…
▽ More
We describe a method for signal parameter estimation using the signed cumulative distribution transform (SCDT), a recently introduced signal representation tool based on optimal transport theory. The method builds upon signal estimation using the cumulative distribution transform (CDT) originally introduced for positive distributions. Specifically, we show that Wasserstein-type distance minimization can be performed simply using linear least squares techniques in SCDT space for arbitrary signal classes, thus providing a global minimizer for the estimation problem even when the underlying signal is a nonlinear function of the unknown parameters. Comparisons to current signal estimation methods using $L_p$ minimization shows the advantage of the method.
△ Less
Submitted 16 July, 2022;
originally announced July 2022.
-
The Signed Cumulative Distribution Transform for 1-D Signal Analysis and Classification
Authors:
Akram Aldroubi,
Rocio Diaz Martin,
Ivan Medri,
Gustavo K. Rohde,
Sumati Thareja
Abstract:
This paper presents a new mathematical signal transform that is especially suitable for decoding information related to non-rigid signal displacements. We provide a measure theoretic framework to extend the existing Cumulative Distribution Transform [ACHA 45 (2018), no. 3, 616-641] to arbitrary (signed) signals on $\overline{\mathbb{R}}$. We present both forward (analysis) and inverse (synthesis)…
▽ More
This paper presents a new mathematical signal transform that is especially suitable for decoding information related to non-rigid signal displacements. We provide a measure theoretic framework to extend the existing Cumulative Distribution Transform [ACHA 45 (2018), no. 3, 616-641] to arbitrary (signed) signals on $\overline{\mathbb{R}}$. We present both forward (analysis) and inverse (synthesis) formulas for the transform, and describe several of its properties including translation, scaling, convexity, linear separability and others. Finally, we describe a metric in transform space, and demonstrate the application of the transform in classifying (detecting) signals under random displacements.
△ Less
Submitted 3 June, 2021;
originally announced June 2021.