-
Fast Large Deformation Matching with the Energy Distance Kernel
Authors:
Siwan Boufadene,
François-Xavier Vialard,
Jean Feydy
Abstract:
We propose an efficient framework for point cloud and measure registration using bi-Lipschitz homeomorphisms, achieving O(n log n) complexity, where n is the number of points. By leveraging the Energy-Distance (ED) kernel, which can be approximated by its sliced one-dimensional projections, each computable in O(n log n), our method avoids hyperparameter tuning and enables efficient large-scale opt…
▽ More
We propose an efficient framework for point cloud and measure registration using bi-Lipschitz homeomorphisms, achieving O(n log n) complexity, where n is the number of points. By leveraging the Energy-Distance (ED) kernel, which can be approximated by its sliced one-dimensional projections, each computable in O(n log n), our method avoids hyperparameter tuning and enables efficient large-scale optimization. The main issue to be solved is the lack of regularity of the ED kernel. To this goal, we introduce two models that regularize the deformations and retain a low computational footprint. The first model relies on TV regularization, while the second model avoids the non-smooth TV regularization at the cost of restricting its use to the space of measures, or cloud of points. Last, we demonstrate the numerical robustness and scalability of our models on synthetic and real data.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime
Authors:
Raphaël Barboni,
Gabriel Peyré,
François-Xavier Vialard
Abstract:
We study the convergence of gradient methods for the training of mean-field single hidden layer neural networks with square loss. Observing this is a separable non-linear least-square problem which is linear w.r.t. the outer layer's weights, we consider a Variable Projection (VarPro) or two-timescale learning algorithm, thereby eliminating the linear variables and reducing the learning problem to…
▽ More
We study the convergence of gradient methods for the training of mean-field single hidden layer neural networks with square loss. Observing this is a separable non-linear least-square problem which is linear w.r.t. the outer layer's weights, we consider a Variable Projection (VarPro) or two-timescale learning algorithm, thereby eliminating the linear variables and reducing the learning problem to the training of the feature distribution. Whereas most convergence rates or the training of neural networks rely on a neural tangent kernel analysis where features are fixed, we show such a strategy enables provable convergence rates for the sampling of a teacher feature distribution. Precisely, in the limit where the regularization strength vanishes, we show that the dynamic of the feature distribution corresponds to a weighted ultra-fast diffusion equation. Relying on recent results on the asymptotic behavior of such PDEs, we obtain guarantees for the convergence of the trained feature distribution towards the teacher feature distribution in a teacher-student setup.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Nonnegative cross-curvature in infinite dimensions: synthetic definition and spaces of measures
Authors:
Flavien Léger,
Gabriele Todeschi,
François-Xavier Vialard
Abstract:
Nonnegative cross-curvature (NNCC) is a geometric property of a cost function defined on a product space that originates in optimal transportation and the Ma-Trudinger-Wang theory. Motivated by applications in optimization, gradient flows and mechanism design, we propose a variational formulation of nonnegative cross-curvature on c-convex domains applicable to infinite dimensions and nonsmooth set…
▽ More
Nonnegative cross-curvature (NNCC) is a geometric property of a cost function defined on a product space that originates in optimal transportation and the Ma-Trudinger-Wang theory. Motivated by applications in optimization, gradient flows and mechanism design, we propose a variational formulation of nonnegative cross-curvature on c-convex domains applicable to infinite dimensions and nonsmooth settings. The resulting class of NNCC spaces is closed under Gromov-Hausdorff convergence and for this class, we extend many properties of classical nonnegative cross-curvature: stability under generalized Riemannian submersions, characterization in terms of the convexity of certain sets of c-concave functions, and in the metric case, it is a subclass of positively curved spaces in the sense of Alexandrov. One of our main results is that Wasserstein spaces of probability measures inherit the NNCC property from their base space. Additional examples of NNCC costs include the Bures-Wasserstein and Fisher-Rao squared distances, the Hellinger-Kantorovich squared distance (in some cases), the relative entropy on probability measures, and the 2-Gromov-Wasserstein squared distance on metric measure spaces.
△ Less
Submitted 19 February, 2025; v1 submitted 26 September, 2024;
originally announced September 2024.
-
Entropic Semi-Martingale Optimal Transport
Authors:
Jean-David Benamou,
Guillaume Chazareix,
Marc Hoffmann,
Grégoire Loeper,
François-Xavier Vialard
Abstract:
Entropic Optimal Transport (EOT), also referred to as the Schrödinger problem, seeks to find a random processes with prescribed initial/final marginals and with minimal relative entropy with respect to a reference measure. The relative entropy forces the two measures to share the same support and only the drift of the controlled process can be adjusted, the diffusion being imposed by the reference…
▽ More
Entropic Optimal Transport (EOT), also referred to as the Schrödinger problem, seeks to find a random processes with prescribed initial/final marginals and with minimal relative entropy with respect to a reference measure. The relative entropy forces the two measures to share the same support and only the drift of the controlled process can be adjusted, the diffusion being imposed by the reference measure. Therefore, at first sight, Semi-Martingale Optimal Transport (SMOT) problems (see [1]) seem out of the scope of applications of Entropic regularization techniques, which are otherwise very attractive from a computational point of view. However, when the process is observed only at discrete times, and become therefore a Markov chain, its relative entropy can remain finite even with variable diffusion coefficients, and discrete semi-martingales can be obtained as solutions of (multi-marginal) EOT problems.Given a (smooth) semi-martingale, the limit of the relative entropy of its time discretizations, scaled by the time step converges to the so-called ``specific relative entropy'', a convex functional of its variance process, similar to those used in SMOT.In this paper we use this observation to build an entropic time discretization of continuous SMOT problems. This allows to compute discrete approximations of solutions to continuous SMOT problems by a multi-marginal Sinkhorn algorithm, without the need of solving the non-linear Hamilton-Jacobi-Bellman pde's associated to the dual problem, as done for example in [1, 2]. We prove a convergence result of the time discrete entropic problem to the continuous time problem, we propose an implementation and provide numerical experiments supporting the theoretical convergence.
△ Less
Submitted 16 December, 2024; v1 submitted 18 August, 2024;
originally announced August 2024.
-
multiGradICON: A Foundation Model for Multimodal Medical Image Registration
Authors:
Basar Demir,
Lin Tian,
Thomas Hastings Greer,
Roland Kwitt,
Francois-Xavier Vialard,
Raul San Jose Estepar,
Sylvain Bouix,
Richard Jarrett Rushmore,
Ebrahim Ebrahim,
Marc Niethammer
Abstract:
Modern medical image registration approaches predict deformations using deep networks. These approaches achieve state-of-the-art (SOTA) registration accuracy and are generally fast. However, deep learning (DL) approaches are, in contrast to conventional non-deep-learning-based approaches, anatomy-specific. Recently, a universal deep registration approach, uniGradICON, has been proposed. However, u…
▽ More
Modern medical image registration approaches predict deformations using deep networks. These approaches achieve state-of-the-art (SOTA) registration accuracy and are generally fast. However, deep learning (DL) approaches are, in contrast to conventional non-deep-learning-based approaches, anatomy-specific. Recently, a universal deep registration approach, uniGradICON, has been proposed. However, uniGradICON focuses on monomodal image registration. In this work, we therefore develop multiGradICON as a first step towards universal *multimodal* medical image registration. Specifically, we show that 1) we can train a DL registration model that is suitable for monomodal *and* multimodal registration; 2) loss function randomization can increase multimodal registration accuracy; and 3) training a model with multimodal data helps multimodal generalization. Our code and the multiGradICON model are available at https://github.com/uncbiag/uniGradICON.
△ Less
Submitted 7 February, 2025; v1 submitted 31 July, 2024;
originally announced August 2024.
-
CARL: A Framework for Equivariant Image Registration
Authors:
Hastings Greer,
Lin Tian,
Francois-Xavier Vialard,
Roland Kwitt,
Raul San Jose Estepar,
Marc Niethammer
Abstract:
Image registration estimates spatial correspondences between a pair of images. These estimates are typically obtained via numerical optimization or regression by a deep network. A desirable property of such estimators is that a correspondence estimate (e.g., the true oracle correspondence) for an image pair is maintained under deformations of the input images. Formally, the estimator should be equ…
▽ More
Image registration estimates spatial correspondences between a pair of images. These estimates are typically obtained via numerical optimization or regression by a deep network. A desirable property of such estimators is that a correspondence estimate (e.g., the true oracle correspondence) for an image pair is maintained under deformations of the input images. Formally, the estimator should be equivariant to a desired class of image transformations. In this work, we present careful analyses of the desired equivariance properties in the context of multi-step deep registration networks. Based on these analyses we 1) introduce the notions of $[U,U]$ equivariance (network equivariance to the same deformations of the input images) and $[W,U]$ equivariance (where input images can undergo different deformations); we 2) show that in a suitable multi-step registration setup it is sufficient for overall $[W,U]$ equivariance if the first step has $[W,U]$ equivariance and all others have $[U,U]$ equivariance; we 3) show that common displacement-predicting networks only exhibit $[U,U]$ equivariance to translations instead of the more powerful $[W,U]$ equivariance; and we 4) show how to achieve multi-step $[W,U]$ equivariance via a coordinate-attention mechanism combined with displacement-predicting refinement layers (CARL). Overall, our approach obtains excellent practical registration performance on several 3D medical image registration tasks and outperforms existing unsupervised approaches for the challenging problem of abdomen registration.
△ Less
Submitted 2 April, 2025; v1 submitted 26 May, 2024;
originally announced May 2024.
-
Semi-Discrete Optimal Transport: Nearly Minimax Estimation With Stochastic Gradient Descent and Adaptive Entropic Regularization
Authors:
Ferdinand Genans,
Antoine Godichon-Baggioni,
François-Xavier Vialard,
Olivier Wintenberger
Abstract:
Optimal Transport (OT) based distances are powerful tools for machine learning to compare probability measures and manipulate them using OT maps. In this field, a setting of interest is semi-discrete OT, where the source measure $μ$ is continuous, while the target $ν$ is discrete. Recent works have shown that the minimax rate for the OT map is $\mathcal{O}(t^{-1/2})$ when using $t$ i.i.d. subsampl…
▽ More
Optimal Transport (OT) based distances are powerful tools for machine learning to compare probability measures and manipulate them using OT maps. In this field, a setting of interest is semi-discrete OT, where the source measure $μ$ is continuous, while the target $ν$ is discrete. Recent works have shown that the minimax rate for the OT map is $\mathcal{O}(t^{-1/2})$ when using $t$ i.i.d. subsamples from each measure (two-sample setting). An open question is whether a better convergence rate can be achieved when the full information of the discrete measure $ν$ is known (one-sample setting). In this work, we answer positively to this question by (i) proving an $\mathcal{O}(t^{-1})$ lower bound rate for the OT map, using the similarity between Laguerre cells estimation and density support estimation, and (ii) proposing a Stochastic Gradient Descent (SGD) algorithm with adaptive entropic regularization and averaging acceleration. To nearly achieve the desired fast rate, characteristic of non-regular parametric problems, we design an entropic regularization scheme decreasing with the number of samples. Another key step in our algorithm consists of using a projection step that permits to leverage the local strong convexity of the regularized OT problem. Our convergence analysis integrates online convex optimization and stochastic gradient techniques, complemented by the specificities of the OT semi-dual. Moreover, while being as computationally and memory efficient as vanilla SGD, our algorithm achieves the unusual fast rates of our theory in numerical experiments.
△ Less
Submitted 9 December, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport
Authors:
Raphaël Barboni,
Gabriel Peyré,
François-Xavier Vialard
Abstract:
We study the convergence of gradient flow for the training of deep neural networks. If Residual Neural Networks are a popular example of very deep architectures, their training constitutes a challenging optimization problem due notably to the non-convexity and the non-coercivity of the objective. Yet, in applications, those tasks are successfully solved by simple optimization algorithms such as gr…
▽ More
We study the convergence of gradient flow for the training of deep neural networks. If Residual Neural Networks are a popular example of very deep architectures, their training constitutes a challenging optimization problem due notably to the non-convexity and the non-coercivity of the objective. Yet, in applications, those tasks are successfully solved by simple optimization algorithms such as gradient descent. To better understand this phenomenon, we focus here on a ``mean-field'' model of infinitely deep and arbitrarily wide ResNet, parameterized by probability measures over the product set of layers and parameters and with constant marginal on the set of layers. Indeed, in the case of shallow neural networks, mean field models have proven to benefit from simplified loss-landscapes and good theoretical guarantees when trained with gradient flow for the Wasserstein metric on the set of probability measures. Motivated by this approach, we propose to train our model with gradient flow w.r.t. the conditional Optimal Transport distance: a restriction of the classical Wasserstein distance which enforces our marginal condition. Relying on the theory of gradient flows in metric spaces we first show the well-posedness of the gradient flow equation and its consistency with the training of ResNets at finite width. Performing a local Polyak-Łojasiewicz analysis, we then show convergence of the gradient flow for well-chosen initializations: if the number of features is finite but sufficiently large and the risk is sufficiently small at initialization, the gradient flow converges towards a global minimizer. This is the first result of this type for infinitely deep and arbitrarily wide ResNets.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
uniGradICON: A Foundation Model for Medical Image Registration
Authors:
Lin Tian,
Hastings Greer,
Roland Kwitt,
Francois-Xavier Vialard,
Raul San Jose Estepar,
Sylvain Bouix,
Richard Rushmore,
Marc Niethammer
Abstract:
Conventional medical image registration approaches directly optimize over the parameters of a transformation model. These approaches have been highly successful and are used generically for registrations of different anatomical regions. Recent deep registration networks are incredibly fast and accurate but are only trained for specific tasks. Hence, they are no longer generic registration approach…
▽ More
Conventional medical image registration approaches directly optimize over the parameters of a transformation model. These approaches have been highly successful and are used generically for registrations of different anatomical regions. Recent deep registration networks are incredibly fast and accurate but are only trained for specific tasks. Hence, they are no longer generic registration approaches. We therefore propose uniGradICON, a first step toward a foundation model for registration providing 1) great performance \emph{across} multiple datasets which is not feasible for current learning-based registration methods, 2) zero-shot capabilities for new registration tasks suitable for different acquisitions, anatomical regions, and modalities compared to the training dataset, and 3) a strong initialization for finetuning on out-of-distribution registration tasks. UniGradICON unifies the speed and accuracy benefits of learning-based registration algorithms with the generic applicability of conventional non-deep-learning approaches. We extensively trained and evaluated uniGradICON on twelve different public datasets. Our code and the uniGradICON model are available at https://github.com/uncbiag/uniGradICON.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
On the global convergence of Wasserstein gradient flow of the Coulomb discrepancy
Authors:
Siwan Boufadène,
François-Xavier Vialard
Abstract:
In this work, we study the Wasserstein gradient flow of the Riesz energy defined on the space of probability measures. The Riesz kernels define a quadratic functional on the space of measure which is not in general geodesically convex in the Wasserstein geometry, therefore one cannot conclude to global convergence of the Wasserstein gradient flow using standard arguments. Our main result is the ex…
▽ More
In this work, we study the Wasserstein gradient flow of the Riesz energy defined on the space of probability measures. The Riesz kernels define a quadratic functional on the space of measure which is not in general geodesically convex in the Wasserstein geometry, therefore one cannot conclude to global convergence of the Wasserstein gradient flow using standard arguments. Our main result is the exponential convergence of the flow to the minimizer on a closed Riemannian manifold under the condition that the logarithm of the source and target measures are H{ö}lder continuous. To this goal, we first prove that the Polyak-Lojasiewicz inequality is satisfied for sufficiently regular solutions. The key regularity result is the global in-time existence of H{ö}lder solutions if the initial and target data are H{ö}lder continuous, proven either in Euclidean space or on a closed Riemannian manifold. For general measures, we prove using flow interchange techniques that there is no local minima other than the global one for the Coulomb kernel. In fact, we prove that a Lagrangian critical point of the functional for the Coulomb (or Energy distance) kernel is equal to the target everywhere except on singular sets with empty interior. In addition, singular enough measures cannot be critical points.
△ Less
Submitted 29 January, 2024; v1 submitted 21 November, 2023;
originally announced December 2023.
-
Inverse Consistency by Construction for Multistep Deep Registration
Authors:
Hastings Greer,
Lin Tian,
Francois-Xavier Vialard,
Roland Kwitt,
Sylvain Bouix,
Raul San Jose Estepar,
Richard Rushmore,
Marc Niethammer
Abstract:
Inverse consistency is a desirable property for image registration. We propose a simple technique to make a neural registration network inverse consistent by construction, as a consequence of its structure, as long as it parameterizes its output transform by a Lie group. We extend this technique to multi-step neural registration by composing many such networks in a way that preserves inverse consi…
▽ More
Inverse consistency is a desirable property for image registration. We propose a simple technique to make a neural registration network inverse consistent by construction, as a consequence of its structure, as long as it parameterizes its output transform by a Lie group. We extend this technique to multi-step neural registration by composing many such networks in a way that preserves inverse consistency. This multi-step approach also allows for inverse-consistent coarse to fine registration. We evaluate our technique on synthetic 2-D data and four 3-D medical image registration tasks and obtain excellent registration accuracy while assuring inverse consistency.
△ Less
Submitted 9 October, 2023; v1 submitted 28 April, 2023;
originally announced May 2023.
-
A geometric Laplace method
Authors:
Flavien Léger,
François-Xavier Vialard
Abstract:
A classical tool for approximating integrals is the Laplace method. The first-order, as well as the higher-order Laplace formula is most often written in coordinates without any geometrical interpretation. In this article, motivated by a situation arising, among others, in optimal transport, we give a geometric formulation of the first-order term of the Laplace method. The central tool is the Kim-…
▽ More
A classical tool for approximating integrals is the Laplace method. The first-order, as well as the higher-order Laplace formula is most often written in coordinates without any geometrical interpretation. In this article, motivated by a situation arising, among others, in optimal transport, we give a geometric formulation of the first-order term of the Laplace method. The central tool is the Kim-McCann Riemannian metric which was introduced in the field of optimal transportation. Our main result expresses the first-order term with standard geometric objects such as volume forms, Laplacians, covariant derivatives and scalar curvatures of two different metrics arising naturally in the Kim-McCann framework. Passing by, we give an explicitly quantified version of the Laplace formula, as well as examples of applications.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Unbalanced Optimal Transport, from Theory to Numerics
Authors:
Thibault Séjourné,
Gabriel Peyré,
François-Xavier Vialard
Abstract:
Optimal Transport (OT) has recently emerged as a central tool in data sciences to compare in a geometrically faithful way point clouds and more generally probability distributions. The wide adoption of OT into existing data analysis and machine learning pipelines is however plagued by several shortcomings. This includes its lack of robustness to outliers, its high computational costs, the need for…
▽ More
Optimal Transport (OT) has recently emerged as a central tool in data sciences to compare in a geometrically faithful way point clouds and more generally probability distributions. The wide adoption of OT into existing data analysis and machine learning pipelines is however plagued by several shortcomings. This includes its lack of robustness to outliers, its high computational costs, the need for a large number of samples in high dimension and the difficulty to handle data in distinct spaces. In this review, we detail several recently proposed approaches to mitigate these issues. We insist in particular on unbalanced OT, which compares arbitrary positive measures, not restricted to probability distributions (i.e. their total mass can vary). This generalization of OT makes it robust to outliers and missing data. The second workhorse of modern computational OT is entropic regularization, which leads to scalable algorithms while lowering the sample complexity in high dimension. The last point presented in this review is the Gromov-Wasserstein (GW) distance, which extends OT to cope with distributions belonging to different metric spaces. The main motivation for this review is to explain how unbalanced OT, entropic regularization and GW can work hand-in-hand to turn OT into efficient geometric loss functions for data sciences.
△ Less
Submitted 16 January, 2023; v1 submitted 16 November, 2022;
originally announced November 2022.
-
On the existence of Monge maps for the Gromov-Wasserstein problem
Authors:
Théo Dumont,
Théo Lacombe,
François-Xavier Vialard
Abstract:
The Gromov--Wasserstein problem is a non-convex optimization problem over the polytope of transportation plans between two probability measures supported on two spaces, each equipped with a cost function evaluating similarities between points. Akin to the standard optimal transportation problem, it is natural to ask for conditions guaranteeing some structure on the optimizers, for instance if thes…
▽ More
The Gromov--Wasserstein problem is a non-convex optimization problem over the polytope of transportation plans between two probability measures supported on two spaces, each equipped with a cost function evaluating similarities between points. Akin to the standard optimal transportation problem, it is natural to ask for conditions guaranteeing some structure on the optimizers, for instance if these are induced by a (Monge) map. We study this question in Euclidean spaces when the cost functions are either given by (i) inner products or (ii) squared distances, two standard choices in the literature. We establish the existence of an optimal map in case (i) and of an optimal 2-map (the union of the graphs of two maps) in case (ii), both under an absolute continuity condition on the source measure. Additionally, in case (ii) and in dimension one, we numerically design situations where optimizers of the Gromov--Wasserstein problem are 2-maps but are not maps. This suggests that our result cannot be improved in general for this cost. Still in dimension one, we additionally establish the optimality of monotone maps under some conditions on the measures, thereby giving insight on why such maps often appear to be optimal in numerical experiments.
△ Less
Submitted 29 July, 2024; v1 submitted 19 October, 2022;
originally announced October 2022.
-
$\texttt{GradICON}$: Approximate Diffeomorphisms via Gradient Inverse Consistency
Authors:
Lin Tian,
Hastings Greer,
François-Xavier Vialard,
Roland Kwitt,
Raúl San José Estépar,
Richard Jarrett Rushmore,
Nikolaos Makris,
Sylvain Bouix,
Marc Niethammer
Abstract:
We present an approach to learning regular spatial transformations between image pairs in the context of medical image registration. Contrary to optimization-based registration techniques and many modern learning-based methods, we do not directly penalize transformation irregularities but instead promote transformation regularity via an inverse consistency penalty. We use a neural network to predi…
▽ More
We present an approach to learning regular spatial transformations between image pairs in the context of medical image registration. Contrary to optimization-based registration techniques and many modern learning-based methods, we do not directly penalize transformation irregularities but instead promote transformation regularity via an inverse consistency penalty. We use a neural network to predict a map between a source and a target image as well as the map when swapping the source and target images. Different from existing approaches, we compose these two resulting maps and regularize deviations of the $\bf{Jacobian}$ of this composition from the identity matrix. This regularizer -- $\texttt{GradICON}$ -- results in much better convergence when training registration models compared to promoting inverse consistency of the composition of maps directly while retaining the desirable implicit regularization effects of the latter. We achieve state-of-the-art registration performance on a variety of real-world medical image datasets using a single set of hyperparameters and a single non-dataset-specific training protocol.
△ Less
Submitted 9 October, 2023; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Stability and upper bounds for statistical estimation of unbalanced transport potentials
Authors:
Adrien Vacher,
François-Xavier Vialard
Abstract:
In this note, we derive upper-bounds on the statistical estimation rates of unbalanced optimal transport (UOT) maps for the quadratic cost. Our work relies on the stability of the semi-dual formulation of optimal transport (OT) extended to the unbalanced case. Depending on the considered variant of UOT, our stability result interpolates between the OT (balanced) case where the semi-dual is only lo…
▽ More
In this note, we derive upper-bounds on the statistical estimation rates of unbalanced optimal transport (UOT) maps for the quadratic cost. Our work relies on the stability of the semi-dual formulation of optimal transport (OT) extended to the unbalanced case. Depending on the considered variant of UOT, our stability result interpolates between the OT (balanced) case where the semi-dual is only locally strongly convex with respect the Sobolev semi-norm H1 dot and the case where it is locally strongly convex with respect to the H 1 norm. When the optimal potential belongs to a certain class C with sufficiently low metric-entropy, local strong convexity enables us to recover super-parametric rates, faster than 1 / root n.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
Toric Geometry of Entropic Regularization
Authors:
Bernd Sturmfels,
Simon Telen,
François-Xavier Vialard,
Max von Renesse
Abstract:
Entropic regularization is a method for large-scale linear programming. Geometrically, one traces intersections of the feasible polytope with scaled toric varieties, starting at the Birch point. We compare this to log-barrier methods, with reciprocal linear spaces, starting at the analytic center. We revisit entropic regularization for unbalanced optimal transport, and we develop the use of optima…
▽ More
Entropic regularization is a method for large-scale linear programming. Geometrically, one traces intersections of the feasible polytope with scaled toric varieties, starting at the Birch point. We compare this to log-barrier methods, with reciprocal linear spaces, starting at the analytic center. We revisit entropic regularization for unbalanced optimal transport, and we develop the use of optimal conic couplings. We compute the degree of the associated toric variety, and we explore algorithms like iterative scaling.
△ Less
Submitted 10 February, 2023; v1 submitted 3 February, 2022;
originally announced February 2022.
-
Faster Unbalanced Optimal Transport: Translation invariant Sinkhorn and 1-D Frank-Wolfe
Authors:
Thibault Séjourné,
François-Xavier Vialard,
Gabriel Peyré
Abstract:
Unbalanced optimal transport (UOT) extends optimal transport (OT) to take into account mass variations to compare distributions. This is crucial to make OT successful in ML applications, making it robust to data normalization and outliers. The baseline algorithm is Sinkhorn, but its convergence speed might be significantly slower for UOT than for OT. In this work, we identify the cause for this de…
▽ More
Unbalanced optimal transport (UOT) extends optimal transport (OT) to take into account mass variations to compare distributions. This is crucial to make OT successful in ML applications, making it robust to data normalization and outliers. The baseline algorithm is Sinkhorn, but its convergence speed might be significantly slower for UOT than for OT. In this work, we identify the cause for this deficiency, namely the lack of a global normalization of the iterates, which equivalently corresponds to a translation of the dual OT potentials. Our first contribution leverages this idea to develop a provably accelerated Sinkhorn algorithm (coined 'translation invariant Sinkhorn') for UOT, bridging the computational gap with OT. Our second contribution focusses on 1-D UOT and proposes a Frank-Wolfe solver applied to this translation invariant formulation. The linear oracle of each steps amounts to solving a 1-D OT problems, resulting in a linear time complexity per iteration. Our last contribution extends this method to the computation of UOT barycenter of 1-D measures. Numerical simulations showcase the convergence speed improvement brought by these three approaches.
△ Less
Submitted 3 January, 2022;
originally announced January 2022.
-
Regularity theory and geometry of unbalanced optimal transport
Authors:
Thomas Gallouët,
Roberta Ghezzi,
François-Xavier Vialard
Abstract:
Using the dual formulation only, we show that the regularity of unbalanced optimal transport also called entropy-transport inherits from the regularity of standard optimal transport. We provide detailed examples of Riemannian manifolds and costs for which unbalanced optimal transport is regular.Among all entropy-transport formulations, Wasserstein-Fisher-Rao (WFR) metric, also called Hellinger-Kan…
▽ More
Using the dual formulation only, we show that the regularity of unbalanced optimal transport also called entropy-transport inherits from the regularity of standard optimal transport. We provide detailed examples of Riemannian manifolds and costs for which unbalanced optimal transport is regular.Among all entropy-transport formulations, Wasserstein-Fisher-Rao (WFR) metric, also called Hellinger-Kantorovich, stands out since it admits a dynamic formulation, which extends the Benamou-Brenier formulation of optimal transport. After demonstrating the equivalence between dynamic and static formulations on a closed Riemannian manifold, we prove a polar factorization theorem, similar to the one due to Brenier and Mc-Cann. As a byproduct, we formulate the Monge-Amp{è}re equation associated with WFR metric, which also holds for more general costs. Last, we study the link between c-convex functions for the cost induced by the WFR metric and the cost on the cone. The main result is that the weak Ma-Trudinger-Wang condition on the cone implies the same condition on the manifold for the cost induced by WFR.
△ Less
Submitted 1 July, 2024; v1 submitted 21 December, 2021;
originally announced December 2021.
-
Parameter tuning and model selection in optimal transport with semi-dual Brenier formulation
Authors:
Adrien Vacher,
François-Xavier Vialard
Abstract:
Over the past few years, numerous computational models have been developed to solve Optimal Transport (OT) in a stochastic setting, where distributions are represented by samples and where the goal is to find the closest map to the ground truth OT map, unknown in practical settings. So far, no quantitative criterion has yet been put forward to tune the parameters of these models and select maps th…
▽ More
Over the past few years, numerous computational models have been developed to solve Optimal Transport (OT) in a stochastic setting, where distributions are represented by samples and where the goal is to find the closest map to the ground truth OT map, unknown in practical settings. So far, no quantitative criterion has yet been put forward to tune the parameters of these models and select maps that best approximate the ground truth. To perform this task, we propose to leverage the Brenier formulation of OT.Theoretically, we show that this formulation guarantees that, up to sharp a distortion parameter depending on the smoothness/strong convexity and a statistical deviation term, the selected map achieves the lowest quadratic error to the ground truth. This criterion, estimated via convex optimization, enables parameter tuning and model selection among entropic regularization of OT, input convex neural networks and smooth and strongly convex nearest-Brenier (SSNB) models.We also use this criterion to question the use of OT in Domain-Adaptation (DA). In a standard DA experiment, it enables us to identify the potential that is closest to the true OT map between the source and the target. Yet, we observe that this selected potential is far from being the one that performs best for the downstream transfer classification task.
△ Less
Submitted 31 January, 2023; v1 submitted 14 December, 2021;
originally announced December 2021.
-
Global convergence of ResNets: From finite to infinite width using linear parameterization
Authors:
Raphaël Barboni,
Gabriel Peyré,
François-Xavier Vialard
Abstract:
Overparametrization is a key factor in the absence of convexity to explain global convergence of gradient descent (GD) for neural networks. Beside the well studied lazy regime, infinite width (mean field) analysis has been developed for shallow networks, using on convex optimization technics. To bridge the gap between the lazy and mean field regimes, we study Residual Networks (ResNets) in which t…
▽ More
Overparametrization is a key factor in the absence of convexity to explain global convergence of gradient descent (GD) for neural networks. Beside the well studied lazy regime, infinite width (mean field) analysis has been developed for shallow networks, using on convex optimization technics. To bridge the gap between the lazy and mean field regimes, we study Residual Networks (ResNets) in which the residual block has linear parametrization while still being nonlinear. Such ResNets admit both infinite depth and width limits, encoding residual blocks in a Reproducing Kernel Hilbert Space (RKHS). In this limit, we prove a local Polyak-Lojasiewicz inequality. Thus, every critical point is a global minimizer and a local convergence result of GD holds, retrieving the lazy regime. In contrast with other mean-field studies, it applies to both parametric and non-parametric cases under an expressivity condition on the residuals. Our analysis leads to a practical and quantified recipe: starting from a universal RKHS, Random Fourier Features are applied to obtain a finite dimensional parameterization satisfying with high-probability our expressivity condition.
△ Less
Submitted 6 February, 2023; v1 submitted 10 December, 2021;
originally announced December 2021.
-
Near-optimal estimation of smooth transport maps with kernel sums-of-squares
Authors:
Boris Muzellec,
Adrien Vacher,
Francis Bach,
François-Xavier Vialard,
Alessandro Rudi
Abstract:
It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds. However, rather than the distance itself, the object of interest for applications such as generative modeling is the underlying optimal transport map. Hence, computational and statistical guarantees need to b…
▽ More
It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds. However, rather than the distance itself, the object of interest for applications such as generative modeling is the underlying optimal transport map. Hence, computational and statistical guarantees need to be obtained for the estimated maps themselves. In this paper, we propose the first tractable algorithm for which the statistical $L^2$ error on the maps nearly matches the existing minimax lower-bounds for smooth map estimation. Our method is based on solving the semi-dual formulation of optimal transport with an infinite-dimensional sum-of-squares reformulation, and leads to an algorithm which has dimension-free polynomial rates in the number of samples, with potentially exponentially dimension-dependent constants.
△ Less
Submitted 29 December, 2021; v1 submitted 3 December, 2021;
originally announced December 2021.
-
ICON: Learning Regular Maps Through Inverse Consistency
Authors:
Hastings Greer,
Roland Kwitt,
Francois-Xavier Vialard,
Marc Niethammer
Abstract:
Learning maps between data samples is fundamental. Applications range from representation learning, image translation and generative modeling, to the estimation of spatial deformations. Such maps relate feature vectors, or map between feature spaces. Well-behaved maps should be regular, which can be imposed explicitly or may emanate from the data itself. We explore what induces regularity for spat…
▽ More
Learning maps between data samples is fundamental. Applications range from representation learning, image translation and generative modeling, to the estimation of spatial deformations. Such maps relate feature vectors, or map between feature spaces. Well-behaved maps should be regular, which can be imposed explicitly or may emanate from the data itself. We explore what induces regularity for spatial transformations, e.g., when computing image registrations. Classical optimization-based models compute maps between pairs of samples and rely on an appropriate regularizer for well-posedness. Recent deep learning approaches have attempted to avoid using such regularizers altogether by relying on the sample population instead. We explore if it is possible to obtain spatial regularity using an inverse consistency loss only and elucidate what explains map regularity in such a context. We find that deep networks combined with an inverse consistency loss and randomized off-grid interpolation yield well behaved, approximately diffeomorphic, spatial transformations. Despite the simplicity of this approach, our experiments present compelling evidence, on both synthetic and real data, that regular maps can be obtained without carefully tuned explicit regularizers, while achieving competitive registration performance.
△ Less
Submitted 17 June, 2021; v1 submitted 10 May, 2021;
originally announced May 2021.
-
A Dimension-free Computational Upper-bound for Smooth Optimal Transport Estimation
Authors:
Adrien Vacher,
Boris Muzellec,
Alessandro Rudi,
Francis Bach,
Francois-Xavier Vialard
Abstract:
It is well-known that plug-in statistical estimation of optimal transport suffers from the curse of dimensionality. Despite recent efforts to improve the rate of estimation with the smoothness of the problem, the computational complexity of these recently proposed methods still degrades exponentially with the dimension. In this paper, thanks to an infinite-dimensional sum-of-squares representation…
▽ More
It is well-known that plug-in statistical estimation of optimal transport suffers from the curse of dimensionality. Despite recent efforts to improve the rate of estimation with the smoothness of the problem, the computational complexity of these recently proposed methods still degrades exponentially with the dimension. In this paper, thanks to an infinite-dimensional sum-of-squares representation, we derive a statistical estimator of smooth optimal transport which achieves a precision $\varepsilon$ from $\tilde{O}(\varepsilon^{-2})$ independent and identically distributed samples from the distributions, for a computational cost of $\tilde{O}(\varepsilon^{-4})$ when the smoothness increases, hence yielding dimension-free statistical and computational rates, with potentially exponentially dimension-dependent constants.
△ Less
Submitted 1 October, 2021; v1 submitted 13 January, 2021;
originally announced January 2021.
-
The Unbalanced Gromov Wasserstein Distance: Conic Formulation and Relaxation
Authors:
Thibault Séjourné,
François-Xavier Vialard,
Gabriel Peyré
Abstract:
Comparing metric measure spaces (i.e. a metric space endowed with aprobability distribution) is at the heart of many machine learning problems. The most popular distance between such metric measure spaces is theGromov-Wasserstein (GW) distance, which is the solution of a quadratic assignment problem. The GW distance is however limited to the comparison of metric measure spaces endowed with a proba…
▽ More
Comparing metric measure spaces (i.e. a metric space endowed with aprobability distribution) is at the heart of many machine learning problems. The most popular distance between such metric measure spaces is theGromov-Wasserstein (GW) distance, which is the solution of a quadratic assignment problem. The GW distance is however limited to the comparison of metric measure spaces endowed with a probability distribution. To alleviate this issue, we introduce two Unbalanced Gromov-Wasserstein formulations: a distance and a more tractable upper-bounding relaxation.They both allow the comparison of metric spaces equipped with arbitrary positive measures up to isometries. The first formulation is a positive and definite divergence based on a relaxation of the mass conservation constraint using a novel type of quadratically-homogeneous divergence. This divergence works hand in hand with the entropic regularization approach which is popular to solve large scale optimal transport problems. We show that the underlying non-convex optimization problem can be efficiently tackled using a highly parallelizable and GPU-friendly iterative scheme. The second formulation is a distance between mm-spaces up to isometries based on a conic lifting. Lastly, we provide numerical experiments onsynthetic examples and domain adaptation data with a Positive-Unlabeled learning task to highlight the salient features of the unbalanced divergence and its potential applications in ML.
△ Less
Submitted 16 January, 2023; v1 submitted 9 September, 2020;
originally announced September 2020.
-
A Shooting Formulation of Deep Learning
Authors:
François-Xavier Vialard,
Roland Kwitt,
Susan Wei,
Marc Niethammer
Abstract:
Continuous-depth neural networks can be viewed as deep limits of discrete neural networks whose dynamics resemble a discretization of an ordinary differential equation (ODE). Although important steps have been taken to realize the advantages of such continuous formulations, most current techniques are not truly continuous-depth as they assume \textit{identical} layers. Indeed, existing works throw…
▽ More
Continuous-depth neural networks can be viewed as deep limits of discrete neural networks whose dynamics resemble a discretization of an ordinary differential equation (ODE). Although important steps have been taken to realize the advantages of such continuous formulations, most current techniques are not truly continuous-depth as they assume \textit{identical} layers. Indeed, existing works throw into relief the myriad difficulties presented by an infinite-dimensional parameter space in learning a continuous-depth neural ODE. To this end, we introduce a shooting formulation which shifts the perspective from parameterizing a network layer-by-layer to parameterizing over optimal networks described only by a set of initial conditions. For scalability, we propose a novel particle-ensemble parametrization which fully specifies the optimal weight trajectory of the continuous-depth neural network. Our experiments show that our particle-ensemble shooting formulation can achieve competitive performance, especially on long-range forecasting tasks. Finally, though the current work is inspired by continuous-depth neural networks, the particle-ensemble shooting formulation also applies to discrete-time networks and may lead to a new fertile area of research in deep learning parametrization.
△ Less
Submitted 8 December, 2020; v1 submitted 18 June, 2020;
originally announced June 2020.
-
Faster Wasserstein Distance Estimation with the Sinkhorn Divergence
Authors:
Lenaic Chizat,
Pierre Roussillon,
Flavien Léger,
François-Xavier Vialard,
Gabriel Peyré
Abstract:
The squared Wasserstein distance is a natural quantity to compare probability distributions in a non-parametric setting. This quantity is usually estimated with the plug-in estimator, defined via a discrete optimal transport problem which can be solved to $ε$-accuracy by adding an entropic regularization of order $ε$ and using for instance Sinkhorn's algorithm. In this work, we propose instead to…
▽ More
The squared Wasserstein distance is a natural quantity to compare probability distributions in a non-parametric setting. This quantity is usually estimated with the plug-in estimator, defined via a discrete optimal transport problem which can be solved to $ε$-accuracy by adding an entropic regularization of order $ε$ and using for instance Sinkhorn's algorithm. In this work, we propose instead to estimate it with the Sinkhorn divergence, which is also built on entropic regularization but includes debiasing terms. We show that, for smooth densities, this estimator has a comparable sample complexity but allows higher regularization levels, of order $ε^{1/2}$, which leads to improved computational complexity bounds and a strong speedup in practice. Our theoretical analysis covers the case of both randomly sampled densities and deterministic discretizations on uniform grids. We also propose and analyze an estimator based on Richardson extrapolation of the Sinkhorn divergence which enjoys improved statistical and computational efficiency guarantees, under a condition on the regularity of the approximation error, which is in particular satisfied for Gaussian densities. We finally demonstrate the efficiency of the proposed estimators with numerical experiments.
△ Less
Submitted 29 October, 2020; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Sinkhorn Divergences for Unbalanced Optimal Transport
Authors:
Thibault Séjourné,
Jean Feydy,
François-Xavier Vialard,
Alain Trouvé,
Gabriel Peyré
Abstract:
Optimal transport induces the Earth Mover's (Wasserstein) distance between probability distributions, a geometric divergence that is relevant to a wide range of problems. Over the last decade, two relaxations of optimal transport have been studied in depth: unbalanced transport, which is robust to the presence of outliers and can be used when distributions don't have the same total mass; entropy-r…
▽ More
Optimal transport induces the Earth Mover's (Wasserstein) distance between probability distributions, a geometric divergence that is relevant to a wide range of problems. Over the last decade, two relaxations of optimal transport have been studied in depth: unbalanced transport, which is robust to the presence of outliers and can be used when distributions don't have the same total mass; entropy-regularized transport, which is robust to sampling noise and lends itself to fast computations using the Sinkhorn algorithm. This paper combines both lines of work to put robust optimal transport on solid ground. Our main contribution is a generalization of the Sinkhorn algorithm to unbalanced transport: our method alternates between the standard Sinkhorn updates and the pointwise application of a contractive function. This implies that entropic transport solvers on grid images, point clouds and sampled distributions can all be modified easily to support unbalanced transport, with a proof of linear convergence that holds in all settings. We then show how to use this method to define pseudo-distances on the full space of positive measures that satisfy key geometric axioms: (unbalanced) Sinkhorn divergences are differentiable, positive, definite, convex, statistically robust and avoid any "entropic bias" towards a shrinkage of the measures' supports.
△ Less
Submitted 16 January, 2023; v1 submitted 28 October, 2019;
originally announced October 2019.
-
Nonlinear model reduction on metric spaces. Application to one-dimensional conservative PDEs in Wasserstein spaces
Authors:
V. Ehrlacher,
D. Lombardi,
O. Mula,
F. -X. Vialard
Abstract:
We consider the problem of model reduction of parametrized PDEs where the goal is to approximate any function belonging to the set of solutions at a reduced computational cost. For this, the bottom line of most strategies has so far been based on the approximation of the solution set by linear spaces on Hilbert or Banach spaces. This approach can be expected to be successful only when the Kolmogor…
▽ More
We consider the problem of model reduction of parametrized PDEs where the goal is to approximate any function belonging to the set of solutions at a reduced computational cost. For this, the bottom line of most strategies has so far been based on the approximation of the solution set by linear spaces on Hilbert or Banach spaces. This approach can be expected to be successful only when the Kolmogorov width of the set decays fast. While this is the case on certain parabolic or elliptic problems, most transport-dominated problems are expected to present a slow decaying width and require to study nonlinear approximation methods. In this work, we propose to address the reduction problem from the perspective of general metric spaces with a suitably defined notion of distance. We develop and compare two different approaches, one based on barycenters and another one using tangent spaces when the metric space has an additional Riemannian structure. As a consequence of working in metric spaces, both approaches are automatically nonlinear. We give theoretical and numerical evidence of their efficiency to reduce complexity for one-dimensional conservative PDEs where the underlying metric space can be chosen to be the $L^2$-Wasserstein space.
△ Less
Submitted 28 February, 2020; v1 submitted 14 September, 2019;
originally announced September 2019.
-
Metric completion of $Diff([0,1])$ with the $H1$ right-invariant metric
Authors:
Simone Di Marino,
Andrea Natale,
Rabah Tahraoui,
François-Xavier Vialard
Abstract:
We consider the group of smooth increasing diffeomorphisms Diff on the unit interval endowed with the right-invariant $H^1$ metric. We compute the metric completion of this space which appears to be the space of increasing maps of the unit interval with boundary conditions at $0$ and $1$. We compute the lower-semicontinuous envelope associated with the length minimizing geodesic variational proble…
▽ More
We consider the group of smooth increasing diffeomorphisms Diff on the unit interval endowed with the right-invariant $H^1$ metric. We compute the metric completion of this space which appears to be the space of increasing maps of the unit interval with boundary conditions at $0$ and $1$. We compute the lower-semicontinuous envelope associated with the length minimizing geodesic variational problem. We discuss the Eulerian and Lagrangian formulation of this relaxation and we show that smooth solutions of the EPDiff equation are length minimizing for short times.
△ Less
Submitted 21 June, 2019;
originally announced June 2019.
-
Region-specific Diffeomorphic Metric Mapping
Authors:
Zhengyang Shen,
François-Xavier Vialard,
Marc Niethammer
Abstract:
We introduce a region-specific diffeomorphic metric mapping (RDMM) registration approach. RDMM is non-parametric, estimating spatio-temporal velocity fields which parameterize the sought-for spatial transformation. Regularization of these velocity fields is necessary. However, while existing non-parametric registration approaches, e.g., the large displacement diffeomorphic metric mapping (LDDMM) m…
▽ More
We introduce a region-specific diffeomorphic metric mapping (RDMM) registration approach. RDMM is non-parametric, estimating spatio-temporal velocity fields which parameterize the sought-for spatial transformation. Regularization of these velocity fields is necessary. However, while existing non-parametric registration approaches, e.g., the large displacement diffeomorphic metric mapping (LDDMM) model, use a fixed spatially-invariant regularization our model advects a spatially-varying regularizer with the estimated velocity field, thereby naturally attaching a spatio-temporal regularizer to deforming objects. We explore a family of RDMM registration approaches: 1) a registration model where regions with separate regularizations are pre-defined (e.g., in an atlas space), 2) a registration model where a general spatially-varying regularizer is estimated, and 3) a registration model where the spatially-varying regularizer is obtained via an end-to-end trained deep learning (DL) model. We provide a variational derivation of RDMM, show that the model can assure diffeomorphic transformations in the continuum, and that LDDMM is a particular instance of RDMM. To evaluate RDMM performance we experiment 1) on synthetic 2D data and 2) on two 3D datasets: knee magnetic resonance images (MRIs) of the Osteoarthritis Initiative (OAI) and computed tomography images (CT) of the lung. Results show that our framework achieves state-of-the-art image registration performance, while providing additional information via a learned spatio-temoporal regularizer. Further, our deep learning approach allows for very fast RDMM and LDDMM estimations. Our code will be open-sourced. Code is available at https://github.com/uncbiag/registration.
△ Less
Submitted 8 November, 2019; v1 submitted 31 May, 2019;
originally announced June 2019.
-
Metric Learning for Image Registration
Authors:
Marc Niethammer,
Roland Kwitt,
Francois-Xavier Vialard
Abstract:
Image registration is a key technique in medical image analysis to estimate deformations between image pairs. A good deformation model is important for high-quality estimates. However, most existing approaches use ad-hoc deformation models chosen for mathematical convenience rather than to capture observed data variation. Recent deep learning approaches learn deformation models directly from data.…
▽ More
Image registration is a key technique in medical image analysis to estimate deformations between image pairs. A good deformation model is important for high-quality estimates. However, most existing approaches use ad-hoc deformation models chosen for mathematical convenience rather than to capture observed data variation. Recent deep learning approaches learn deformation models directly from data. However, they provide limited control over the spatial regularity of transformations. Instead of learning the entire registration approach, we learn a spatially-adaptive regularizer within a registration model. This allows controlling the desired level of regularity and preserving structural properties of a registration model. For example, diffeomorphic transformations can be attained. Our approach is a radical departure from existing deep learning approaches to image registration by embedding a deep learning model in an optimization-based registration algorithm to parameterize and data-adapt the registration model itself.
△ Less
Submitted 20 April, 2019;
originally announced April 2019.
-
Math in the Black Forest: Workshop on New Directions in Shape Analysis
Authors:
Martin Bauer,
Nicolas Charon,
Philipp Harms,
Boris Khesin,
Alice Le Brigant,
Elodie Maignant,
Stephen Marsland,
Peter Michor,
Xavier Pennec,
Stephen Preston,
Stefan Sommer,
François-Xavier Vialard
Abstract:
These are the proceedings of the workshop "Math in the Black Forest", which brought together researchers in shape analysis to discuss promising new directions. Shape analysis is an inter-disciplinary area of research with theoretical foundations in infinite-dimensional Riemannian geometry, geometric statistics, and geometric stochastics, and with applications in medical imaging, evolutionary devel…
▽ More
These are the proceedings of the workshop "Math in the Black Forest", which brought together researchers in shape analysis to discuss promising new directions. Shape analysis is an inter-disciplinary area of research with theoretical foundations in infinite-dimensional Riemannian geometry, geometric statistics, and geometric stochastics, and with applications in medical imaging, evolutionary development, and fluid dynamics. The workshop is the 6th instance of a series of workshops on the same topic.
△ Less
Submitted 4 November, 2018;
originally announced November 2018.
-
Interpolating between Optimal Transport and MMD using Sinkhorn Divergences
Authors:
Jean Feydy,
Thibault Séjourné,
François-Xavier Vialard,
Shun-ichi Amari,
Alain Trouvé,
Gabriel Peyré
Abstract:
Comparing probability distributions is a fundamental problem in data sciences. Simple norms and divergences such as the total variation and the relative entropy only compare densities in a point-wise manner and fail to capture the geometric nature of the problem. In sharp contrast, Maximum Mean Discrepancies (MMD) and Optimal Transport distances (OT) are two classes of distances between measures t…
▽ More
Comparing probability distributions is a fundamental problem in data sciences. Simple norms and divergences such as the total variation and the relative entropy only compare densities in a point-wise manner and fail to capture the geometric nature of the problem. In sharp contrast, Maximum Mean Discrepancies (MMD) and Optimal Transport distances (OT) are two classes of distances between measures that take into account the geometry of the underlying space and metrize the convergence in law.
This paper studies the Sinkhorn divergences, a family of geometric divergences that interpolates between MMD and OT. Relying on a new notion of geometric entropy, we provide theoretical guarantees for these divergences: positivity, convexity and metrization of the convergence in law. On the practical side, we detail a numerical scheme that enables the large scale application of these divergences for machine learning: on the GPU, gradients of the Sinkhorn loss can be computed for batches of a million samples.
△ Less
Submitted 18 October, 2018;
originally announced October 2018.
-
Generalized compressible flows and solutions of the H(div) geodesic problem
Authors:
Thomas Gallouët,
Andrea Natale,
François-Xavier Vialard
Abstract:
We study the geodesic problem on the group of diffeomorphism of a domain M$\subset$Rd, equipped with the H(div) metric. The geodesic equations coincide with the Camassa-Holm equation when d=1, and represent one of its possible multi-dimensional generalizations when d>1. We propose a relaxation {à} la Brenier of this problem, in which solutions are represented as probability measures on the spa…
▽ More
We study the geodesic problem on the group of diffeomorphism of a domain M$\subset$Rd, equipped with the H(div) metric. The geodesic equations coincide with the Camassa-Holm equation when d=1, and represent one of its possible multi-dimensional generalizations when d>1. We propose a relaxation {à} la Brenier of this problem, in which solutions are represented as probability measures on the space of continuous paths on the cone over M. We use this relaxation to prove that smooth H(div) geodesics are globally length minimizing for short times. We also prove that there exists a unique pressure field associated to solutions of our relaxation. Finally, we propose a numerical scheme to construct generalized solutions on the cone and present some numerical results illustrating the relation between the generalized Camassa-Holm and incompressible Euler solutions.
△ Less
Submitted 11 October, 2019; v1 submitted 28 June, 2018;
originally announced June 2018.
-
Embedding Camassa-Holm equations in incompressible Euler
Authors:
François-Xavier Vialard,
Andrea Natale
Abstract:
In this article, we show how to embed the so-called CH2 equations into the geodesic flow of the Hdiv metric in 2D, which, itself, can be embedded in the incompressible Euler equation of a non compact Riemannian manifold. The method consists in embedding the incompressible Euler equation with a potential term coming from classical mechanics into incompressible Euler of a manifold and seeing the CH2…
▽ More
In this article, we show how to embed the so-called CH2 equations into the geodesic flow of the Hdiv metric in 2D, which, itself, can be embedded in the incompressible Euler equation of a non compact Riemannian manifold. The method consists in embedding the incompressible Euler equation with a potential term coming from classical mechanics into incompressible Euler of a manifold and seeing the CH2 equation as a particular case of such fluid dynamic equation.
△ Less
Submitted 30 April, 2018;
originally announced April 2018.
-
Variational Second-Order Interpolation on the Group of Diffeomorphisms with a Right-Invariant Metric
Authors:
François-Xavier Vialard
Abstract:
In this note, we propose a variational framework in which the minimization of the acceleration on the group of diffeomorphisms endowed with a right-invariant metric is well-posed. It relies on constraining the acceleration to belong to a Sobolev space of higher-order than the order of the metric in order to gain compactness. It provides the theoretical guarantee of existence of minimizers which is…
▽ More
In this note, we propose a variational framework in which the minimization of the acceleration on the group of diffeomorphisms endowed with a right-invariant metric is well-posed. It relies on constraining the acceleration to belong to a Sobolev space of higher-order than the order of the metric in order to gain compactness. It provides the theoretical guarantee of existence of minimizers which is compulsory for numerical simulations.
△ Less
Submitted 12 January, 2018;
originally announced January 2018.
-
Second order models for optimal transport and cubic splines on the Wasserstein space
Authors:
Jean-David Benamou,
Thomas Gallouët,
François-Xavier Vialard
Abstract:
On the space of probability densities, we extend the Wasserstein geodesics to the case of higher-order interpolation such as cubic spline interpolation. After presenting the natural extension of cubic splines to the Wasserstein space, we propose a simpler approach based on the relaxation of the variational problem on the path space. We explore two different numerical approaches, one based on multi…
▽ More
On the space of probability densities, we extend the Wasserstein geodesics to the case of higher-order interpolation such as cubic spline interpolation. After presenting the natural extension of cubic splines to the Wasserstein space, we propose a simpler approach based on the relaxation of the variational problem on the path space. We explore two different numerical approaches, one based on multi-marginal optimal transport and entropic regularization and the other based on semi-discrete optimal transport.
△ Less
Submitted 26 July, 2018; v1 submitted 12 January, 2018;
originally announced January 2018.
-
Optimal Transport for Diffeomorphic Registration
Authors:
Jean Feydy,
Benjamin Charlier,
François-Xavier Vialard,
Gabriel Peyré
Abstract:
This paper introduces the use of unbalanced optimal transport methods as a similarity measure for diffeomorphic matching of imaging data. The similarity measure is a key object in diffeomorphic registration methods that, together with the regularization on the deformation, defines the optimal deformation. Most often, these similarity measures are local or non local but simple enough to be computat…
▽ More
This paper introduces the use of unbalanced optimal transport methods as a similarity measure for diffeomorphic matching of imaging data. The similarity measure is a key object in diffeomorphic registration methods that, together with the regularization on the deformation, defines the optimal deformation. Most often, these similarity measures are local or non local but simple enough to be computationally fast. We build on recent theoretical and numerical advances in optimal transport to propose fast and global similarity measures that can be used on surfaces or volumetric imaging data. This new similarity measure is computed using a fast generalized Sinkhorn algorithm. We apply this new metric in the LDDMM framework on synthetic and real data, fibres bundles and surfaces and show that better matching results are obtained.
△ Less
Submitted 16 June, 2017;
originally announced June 2017.
-
Quantum Optimal Transport for Tensor Field Processing
Authors:
Gabriel Peyré,
Lenaïc Chizat,
François-Xavier Vialard,
Justin Solomon
Abstract:
This article introduces a new notion of optimal transport (OT) between tensor fields, which are measures whose values are positive semidefinite (PSD) matrices. This "quantum" formulation of OT (Q-OT) corresponds to a relaxed version of the classical Kantorovich transport problem, where the fidelity between the input PSD-valued measures is captured using the geometry of the Von-Neumann quantum entr…
▽ More
This article introduces a new notion of optimal transport (OT) between tensor fields, which are measures whose values are positive semidefinite (PSD) matrices. This "quantum" formulation of OT (Q-OT) corresponds to a relaxed version of the classical Kantorovich transport problem, where the fidelity between the input PSD-valued measures is captured using the geometry of the Von-Neumann quantum entropy. We propose a quantum-entropic regularization of the resulting convex optimization problem, which can be solved efficiently using an iterative scaling algorithm. This method is a generalization of the celebrated Sinkhorn algorithm to the quantum setting of PSD matrices. We extend this formulation and the quantum Sinkhorn algorithm to compute barycenters within a collection of input tensor fields. We illustrate the usefulness of the proposed approach on applications to procedural noise generation, anisotropic meshing, diffusion tensor imaging and spectral texture synthesis.
△ Less
Submitted 23 July, 2017; v1 submitted 20 December, 2016;
originally announced December 2016.
-
The Camassa-Holm equation as an incompressible Euler equation: a geometric point of view
Authors:
Thomas Gallouët,
François-Xavier Vialard
Abstract:
The group of diffeomorphisms of a compact manifold endowed with the L^2 metric acting on the space of probability densities gives a unifying framework for the incompressible Euler equation and the theory of optimal mass transport. Recently, several authors have extended optimal transport to the space of positive Radon measures where the Wasserstein-Fisher-Rao distance is a natural extension of the…
▽ More
The group of diffeomorphisms of a compact manifold endowed with the L^2 metric acting on the space of probability densities gives a unifying framework for the incompressible Euler equation and the theory of optimal mass transport. Recently, several authors have extended optimal transport to the space of positive Radon measures where the Wasserstein-Fisher-Rao distance is a natural extension of the classical L^2-Wasserstein distance. In this paper, we show a similar relation between this unbalanced optimal transport problem and the Hdiv right-invariant metric on the group of diffeomorphisms, which corresponds to the Camassa-Holm (CH) equation in one dimension. On the optimal transport side, we prove a polar factorization theorem on the automorphism group of half-densities.Geometrically, our point of view provides an isometric embedding of the group of diffeomorphisms endowed with this right-invariant metric in the automorphisms group of the fiber bundle of half densities endowed with an L^2 type of cone metric. This leads to a new formulation of the (generalized) CH equation as a geodesic equation on an isotropy subgroup of this automorphisms group; On S1, solutions to the standard CH thus give particular solutions of the incompressible Euler equation on a group of homeomorphisms of R^2 which preserve a radial density that has a singularity at 0. An other application consists in proving that smooth solutions of the Euler-Arnold equation for the Hdiv right-invariant metric are length minimizing geodesics for sufficiently short times.
△ Less
Submitted 14 December, 2017; v1 submitted 13 September, 2016;
originally announced September 2016.
-
Scaling Algorithms for Unbalanced Transport Problems
Authors:
Lenaic Chizat,
Gabriel Peyré,
Bernhard Schmitzer,
François-Xavier Vialard
Abstract:
This article introduces a new class of fast algorithms to approximate variational problems involving unbalanced optimal transport. While classical optimal transport considers only normalized probability distributions, it is important for many applications to be able to compute some sort of relaxed transportation between arbitrary positive measures. A generic class of such "unbalanced" optimal tran…
▽ More
This article introduces a new class of fast algorithms to approximate variational problems involving unbalanced optimal transport. While classical optimal transport considers only normalized probability distributions, it is important for many applications to be able to compute some sort of relaxed transportation between arbitrary positive measures. A generic class of such "unbalanced" optimal transport problems has been recently proposed by several authors. In this paper, we show how to extend the, now classical, entropic regularization scheme to these unbalanced problems. This gives rise to fast, highly parallelizable algorithms that operate by performing only diagonal scaling (i.e. pointwise multiplications) of the transportation couplings. They are generalizations of the celebrated Sinkhorn algorithm. We show how these methods can be used to solve unbalanced transport, unbalanced gradient flows, and to compute unbalanced barycenters. We showcase applications to 2-D shape modification, color transfer, and growth models.
△ Less
Submitted 22 May, 2017; v1 submitted 20 July, 2016;
originally announced July 2016.
-
Riemannian cubics on the group of diffeomorphisms and the Fisher-Rao metric
Authors:
Rabah Tahraoui,
François-Xavier Vialard
Abstract:
We study a second-order variational problem on the group of diffeomorphisms of the interval [0, 1] endowed with a right-invariant Sobolev metric of order 2, which consists in the minimization of the acceleration. We compute the relaxation of the problem which involves the so-called Fisher-Rao functional a convex functional on the space of measures. This relaxation enables the derivation of several…
▽ More
We study a second-order variational problem on the group of diffeomorphisms of the interval [0, 1] endowed with a right-invariant Sobolev metric of order 2, which consists in the minimization of the acceleration. We compute the relaxation of the problem which involves the so-called Fisher-Rao functional a convex functional on the space of measures. This relaxation enables the derivation of several optimality conditions and, in particular, a sufficient condition which guarantees that a given path of the initial problem is also a minimizer of the relaxed one. This sufficient condition is related to the existence of a solution to a Riccati equation involving the path acceleration.
△ Less
Submitted 7 September, 2016; v1 submitted 14 June, 2016;
originally announced June 2016.
-
Unbalanced Optimal Transport: Dynamic and Kantorovich Formulation
Authors:
Lenaic Chizat,
Gabriel Peyré,
Bernhard Schmitzer,
François-Xavier Vialard
Abstract:
This article presents a new class of distances between arbitrary nonnegative Radon measures inspired by optimal transport. These distances are defined by two equivalent alternative formulations: (i) a dynamic formulation defining the distance as a geodesic distance over the space of measures (ii) a static "Kantorovich" formulation where the distance is the minimum of an optimization problem over p…
▽ More
This article presents a new class of distances between arbitrary nonnegative Radon measures inspired by optimal transport. These distances are defined by two equivalent alternative formulations: (i) a dynamic formulation defining the distance as a geodesic distance over the space of measures (ii) a static "Kantorovich" formulation where the distance is the minimum of an optimization problem over pairs of couplings describing the transfer (transport, creation and destruction) of mass between two measures. Both formulations are convex optimization problems, and the ability to switch from one to the other depending on the targeted application is a crucial property of our models. Of particular interest is the Wasserstein-Fisher-Rao metric recently introduced independently by Chizat et al. and Kondratyev et al. Defined initially through a dynamic formulation, it belongs to this class of metrics and hence automatically benefits from a static Kantorovich formulation.
△ Less
Submitted 9 February, 2019; v1 submitted 21 August, 2015;
originally announced August 2015.
-
An Interpolating Distance between Optimal Transport and Fisher-Rao
Authors:
Lenaic Chizat,
Bernhard Schmitzer,
Gabriel Peyré,
François-Xavier Vialard
Abstract:
This paper defines a new transport metric over the space of non-negative measures. This metric interpolates between the quadratic Wasserstein and the Fisher-Rao metrics and generalizes optimal transport to measures with different masses. It is defined as a generalization of the dynamical formulation of optimal transport of Benamou and Brenier, by introducing a source term in the continuity equatio…
▽ More
This paper defines a new transport metric over the space of non-negative measures. This metric interpolates between the quadratic Wasserstein and the Fisher-Rao metrics and generalizes optimal transport to measures with different masses. It is defined as a generalization of the dynamical formulation of optimal transport of Benamou and Brenier, by introducing a source term in the continuity equation. The influence of this source term is measured using the Fisher-Rao metric, and is averaged with the transportation term. This gives rise to a convex variational problem defining our metric. Our first contribution is a proof of the existence of geodesics (i.e. solutions to this variational problem). We then show that (generalized) optimal transport and Fisher-Rao metrics are obtained as limiting cases of our metric. Our last theoretical contribution is a proof that geodesics between mixtures of sufficiently close Diracs are made of translating mixtures of Diracs. Lastly, we propose a numerical scheme making use of first order proximal splitting methods and we show an application of this new distance to image interpolation.
△ Less
Submitted 9 July, 2015; v1 submitted 21 June, 2015;
originally announced June 2015.
-
On Completeness of Groups of Diffeomorphisms
Authors:
Martins Bruveris,
François-Xavier Vialard
Abstract:
We study completeness properties of the Sobolev diffeomorphism groups $\mathcal D^s(M)$ endowed with strong right-invariant Riemannian metrics when the underlying manifold $M$ is $\mathbb R^d$ or compact without boundary. The main result is that for $s > \dim M/2 + 1$, the group $\mathcal D^s(M)$ is geodesically and metrically complete with a surjective exponential map. We then present the connect…
▽ More
We study completeness properties of the Sobolev diffeomorphism groups $\mathcal D^s(M)$ endowed with strong right-invariant Riemannian metrics when the underlying manifold $M$ is $\mathbb R^d$ or compact without boundary. The main result is that for $s > \dim M/2 + 1$, the group $\mathcal D^s(M)$ is geodesically and metrically complete with a surjective exponential map. We then present the connection between the Sobolev diffeomorphism group and the large deformation matching framework in order to apply our results to diffeomorphic image matching.
△ Less
Submitted 27 January, 2016; v1 submitted 9 March, 2014;
originally announced March 2014.
-
Geodesics on Shape Spaces with Bounded Variation and Sobolev Metrics
Authors:
G. Nardi,
G. Peyré,
F. -X. Vialard
Abstract:
This paper studies the space of $BV^2$ planar curves endowed with the $BV^2$ Finsler metric over its tangent space of displacement vector fields. Such a space is of interest for applications in image processing and computer vision because it enables piecewise regular curves that undergo piecewise regular deformations, such as articulations. The main contribution of this paper is the proof of the e…
▽ More
This paper studies the space of $BV^2$ planar curves endowed with the $BV^2$ Finsler metric over its tangent space of displacement vector fields. Such a space is of interest for applications in image processing and computer vision because it enables piecewise regular curves that undergo piecewise regular deformations, such as articulations. The main contribution of this paper is the proof of the existence of the shortest path between any two $BV^2$-curves for this Finsler metric.
Such a result is proved by applying the direct method of calculus of variation to minimize the geodesic energy. This method applies more generally to similar cases such as the space of curves with $H^k$ metrics for $k\geq 2$ integer. This space has a strong Riemannian structure and is geodesically complete. Thus, our result shows that the exponential map is surjective, which is complementary to geodesic completeness in infinite dimensions.
We propose a finite element discretization of the minimal geodesic problem, and use a gradient descent method to compute a stationary point of the energy. Numerical illustrations show the qualitative difference between $BV^2$ and $H^2$ geodesics.
△ Less
Submitted 22 February, 2016; v1 submitted 26 February, 2014;
originally announced February 2014.
-
Diffeomorphic image matching with left-invariant metrics
Authors:
Tanya Schmah,
Laurent Risser,
François-Xavier Vialard
Abstract:
The geometric approach to diffeomorphic image registration known as "large deformation by diffeomorphic metric mapping" (LDDMM) is based on a left action of diffeomorphisms on images, and a right-invariant metric on a diffeomorphism group, usually defined using a reproducing kernel. We explore the use of left-invariant metrics on diffeomorphism groups, based on reproducing kernels defined in the b…
▽ More
The geometric approach to diffeomorphic image registration known as "large deformation by diffeomorphic metric mapping" (LDDMM) is based on a left action of diffeomorphisms on images, and a right-invariant metric on a diffeomorphism group, usually defined using a reproducing kernel. We explore the use of left-invariant metrics on diffeomorphism groups, based on reproducing kernels defined in the body coordinates of a source image. This perspective, which we call Left-LDM, allows us to consider non-isotropic spatially-varying kernels, which can be interpreted as describing variable deformability of the source image. We also show a simple relationship between LDDMM and the new approach, implying that spatially-varying kernels are interpretable in the same way in LDDMM. We conclude with a discussion of a class of kernels that enforce a soft mirror-symmetry constraint, which we validate in numerical experiments on a model of a lesioned brain.
△ Less
Submitted 15 January, 2014;
originally announced January 2014.
-
Piecewise rigid curve deformation via a Finsler steepest descent
Authors:
Guillaume Charpiat,
Giacomo Nardi,
Gabriel Peyré,
François-Xavier Vialard
Abstract:
This paper introduces a novel steepest descent flow in Banach spaces. This extends previous works on generalized gradient descent, notably the work of Charpiat et al., to the setting of Finsler metrics. Such a generalized gradient allows one to take into account a prior on deformations (e.g., piecewise rigid) in order to favor some specific evolutions. We define a Finsler gradient descent method t…
▽ More
This paper introduces a novel steepest descent flow in Banach spaces. This extends previous works on generalized gradient descent, notably the work of Charpiat et al., to the setting of Finsler metrics. Such a generalized gradient allows one to take into account a prior on deformations (e.g., piecewise rigid) in order to favor some specific evolutions. We define a Finsler gradient descent method to minimize a functional defined on a Banach space and we prove a convergence theorem for such a method. In particular, we show that the use of non-Hilbertian norms on Banach spaces is useful to study non-convex optimization problems where the geometry of the space might play a crucial role to avoid poor local minima. We show some applications to the curve matching problem. In particular, we characterize piecewise rigid deformations on the space of curves and we study several models to perform piecewise rigid evolution of curves.
△ Less
Submitted 7 December, 2015; v1 submitted 1 August, 2013;
originally announced August 2013.
-
Bayesian data assimilation in shape registration
Authors:
C. J. Cotter,
S. L. Cotter,
F. -X. Vialard
Abstract:
In this paper we apply a Bayesian framework to the problem of geodesic curve matching. Given a template curve, the geodesic equations provide a mapping from initial conditions for the conjugate momentum onto topologically equivalent shapes. Here, we aim to recover the well-defined posterior distribution on the initial momentum which gives rise to observed points on the target curve; this is achiev…
▽ More
In this paper we apply a Bayesian framework to the problem of geodesic curve matching. Given a template curve, the geodesic equations provide a mapping from initial conditions for the conjugate momentum onto topologically equivalent shapes. Here, we aim to recover the well-defined posterior distribution on the initial momentum which gives rise to observed points on the target curve; this is achieved by explicitly including a reparameterisation in the formulation. Appropriate priors are chosen for the functions which together determine this field and the positions of the observation points, the initial momentum $p_0$ and the reparameterisation vector field $ν$, informed by regularity results about the forward model. Having done this, we illustrate how Maximum Likelihood Estimators (MLEs) can be used to find regions of high posterior density, but also how we can apply recently developed \SLC{Markov chain Monte Carlo (MCMC)} methods on function spaces to characterise the whole of the posterior density. These illustrative examples also include scenarios where the posterior distribution is multimodal and irregular, leading us to the conclusion that knowledge of a state of global maximal posterior density does not always give us the whole picture, and full posterior sampling can give better quantification of likely states and the overall uncertainty inherent in the problem.
△ Less
Submitted 20 December, 2012;
originally announced December 2012.